[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2019-01-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Fix Version/s: 2.3.3

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2019-01-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Fix Version/s: 2.3.3

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: spark2.4 arrow enabled true,error log not returned

2019-01-12 Thread Felix Cheung
Do you mean you run the same code on yarn and standalone? Can you check if they 
are running the same python versions?



From: Bryan Cutler 
Sent: Thursday, January 10, 2019 5:29 PM
To: libinsong1...@gmail.com
Cc: zlist Spark
Subject: Re: spark2.4 arrow enabled true,error log not returned

Hi, could you please clarify if you are running a YARN cluster when you see 
this problem?  I tried on Spark standalone and could not reproduce.  If it's on 
a YARN cluster, please file a JIRA and I can try to investigate further.

Thanks,
Bryan

On Sat, Dec 15, 2018 at 3:42 AM 李斌松 
mailto:libinsong1...@gmail.com>> wrote:
spark2.4 arrow enabled true,error log not returned,in spark 2.3,There's no such 
problem.

1、spark.sql.execution.arrow.enabled=true
[image.png]
yarn log:

18/12/15 14:35:52 INFO CodeGenerator: Code generated in 1030.698785 ms
18/12/15 14:35:54 INFO PythonRunner: Times: total = 1985, boot = 1892, init = 
92, finish = 1
18/12/15 14:35:54 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1799 
bytes result sent to driver
18/12/15 14:35:55 INFO CoarseGrainedExecutorBackend: Got assigned task 1
18/12/15 14:35:55 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
18/12/15 14:35:55 INFO TorrentBroadcast: Started reading broadcast variable 1
18/12/15 14:35:55 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in 
memory (estimated size 8.3 KB, free 1048.8 MB)
18/12/15 14:35:55 INFO TorrentBroadcast: Reading broadcast variable 1 took 18 ms
18/12/15 14:35:55 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 14.0 KB, free 1048.8 MB)
18/12/15 14:35:55 INFO CodeGenerator: Code generated in 30.269745 ms
18/12/15 14:35:55 INFO PythonRunner: Times: total = 13, boot = 5, init = 7, 
finish = 1
18/12/15 14:35:55 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1893 
bytes result sent to driver
18/12/15 14:35:55 INFO CoarseGrainedExecutorBackend: Got assigned task 2
18/12/15 14:35:55 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
18/12/15 14:35:55 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/install/pyspark/2.4.0/pyspark.zip/pyspark/worker.py", line 377, in 
main
process()
  File "/usr/install/pyspark/2.4.0/pyspark.zip/pyspark/worker.py", line 372, in 
process
serializer.dump_stream(func(split_index, iterator), outfile)
  File "/usr/install/pyspark/2.4.0/pyspark.zip/pyspark/serializers.py", line 
390, in dump_stream
vs = list(itertools.islice(iterator, batch))
  File "/usr/install/pyspark/2.4.0/pyspark.zip/pyspark/util.py", line 99, in 
wrapper
return f(*args, **kwargs)
  File 
"/yarn/nm/usercache/admin/appcache/application_1544579748138_0215/container_e43_1544579748138_0215_01_01/python1.py",
 line 435, in mapfunc
ValueError: could not convert string to float: 'a'

at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.hasNext(ArrowConverters.scala:99)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.foreach(ArrowConverters.scala:97)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at 
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.to(ArrowConverters.scala:97)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.toBuffer(ArrowConverters.scala:97)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.toArray(ArrowConverters.scala:97)
at 
org.apache.spark.sql.Dataset$$anonfun$collectAsArrowToPython$1$$anonfun$apply$17$$anonfun$apply$18.apply(Dataset.scala:3314)
at 
org.apache.spark.sql.Dataset$$anonfun$collectAsArrowToPython$1$$anonfun$apply$17$$anonfun$apply$18.apply(Dataset.scala:3314)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

Re: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org

2019-01-07 Thread Felix Cheung
Thanks!


From: Animesh Trivedi 
Sent: Monday, January 7, 2019 1:42:52 AM
To: dev@crail.apache.org
Subject: Re: [NOTICE] Mandatory migration of git repositories to 
gitbox.apache.org

Migration is finished.

On Mon, Jan 7, 2019 at 10:22 AM Animesh Trivedi 
wrote:

> Done: https://issues.apache.org/jira/browse/INFRA-17570
>
> Cheers,
> --
> Animesh
>
> On Sun, Jan 6, 2019 at 9:13 PM Felix Cheung 
> wrote:
>
>> Great, thanks!
>>
>> Could someone take the lead to create the Infra JIRA? (All it needs is to
>> reference this mail thread from lists.apache.org)
>>
>> It should be very simple - once that is done there’s a small step for
>> committers to connect their accounts on gitbox, that’s it.
>>
>>
>> 
>> From: Patrick Stuedi 
>> Sent: Sunday, January 6, 2019 11:47 AM
>> To: dev@crail.apache.org
>> Subject: Re: [NOTICE] Mandatory migration of git repositories to
>> gitbox.apache.org
>>
>> +1 from me too, we should create the infra JIRA as Luciano suggested.
>> Jonas has interacted with the infra team in the past.
>>
>> On Sun, Jan 6, 2019 at 8:09 PM Felix Cheung 
>> wrote:
>> >
>> > Hi there - any more vote/comment?
>> >
>> >
>> > 
>> > From: Luciano Resende 
>> > Sent: Friday, January 4, 2019 5:44 AM
>> > To: dev@crail.apache.org
>> > Subject: Re: [NOTICE] Mandatory migration of git repositories to
>> gitbox.apache.org
>> >
>> > Hey all, please use this thread if you have any concerns/questions
>> > about this move, otherwise please create a INFRA Jira issue and
>> > reference this thread.
>> >
>> > On Thu, Jan 3, 2019 at 8:19 AM Apache Infrastructure Team
>> >  wrote:
>> > >
>> > > Hello, crail folks.
>> > > As stated earlier in 2018, all git repositories must be migrated from
>> > > the git-wip-us.apache.org URL to gitbox.apache.org, as the old
>> service
>> > > is being decommissioned. Your project is receiving this email because
>> > > you still have repositories on git-wip-us that needs to be migrated.
>> > >
>> > > The following repositories on git-wip-us belong to your project:
>> > > - incubator-crail.git
>> > > - incubator-crail-website.git
>> > >
>> > >
>> > > We are now entering the mandated (coordinated) move stage of the
>> roadmap,
>> > > and you are asked to please coordinate migration with the Apache
>> > > Infrastructure Team before February 7th. All repositories not migrated
>> > > on February 7th will be mass migrated without warning, and we'd
>> appreciate
>> > > it if we could work together to avoid a big mess that day :-).
>> > >
>> > > Moving to gitbox means you will get full write access on GitHub as
>> well,
>> > > and be able to close/merge pull requests and much more.
>> > >
>> > > To have your repositories moved, please follow these steps:
>> > >
>> > > - Ensure consensus on the move (a link to a lists.apache.org thread
>> will
>> > > suffice for us as evidence).
>> > > - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA
>> > >
>> > > Your migration should only take a few minutes. If you wish to migrate
>> > > at a specific time of day or date, please do let us know in the
>> ticket.
>> > >
>> > > As always, we appreciate your understanding and patience as we move
>> > > things around and work to provide better services and features for
>> > > the Apache Family.
>> > >
>> > > Should you wish to contact us with feedback or questions, please do so
>> > > at: us...@infra.apache.org.
>> > >
>> > >
>> > > With regards,
>> > > Apache Infrastructure
>> > >
>> >
>> >
>> > --
>> > Luciano Resende
>> > http://twitter.com/lresende1975
>> > http://lresende.blogspot.com/
>>
>


Re: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org

2019-01-06 Thread Felix Cheung
Great, thanks!

Could someone take the lead to create the Infra JIRA? (All it needs is to 
reference this mail thread from lists.apache.org)

It should be very simple - once that is done there’s a small step for 
committers to connect their accounts on gitbox, that’s it.



From: Patrick Stuedi 
Sent: Sunday, January 6, 2019 11:47 AM
To: dev@crail.apache.org
Subject: Re: [NOTICE] Mandatory migration of git repositories to 
gitbox.apache.org

+1 from me too, we should create the infra JIRA as Luciano suggested.
Jonas has interacted with the infra team in the past.

On Sun, Jan 6, 2019 at 8:09 PM Felix Cheung  wrote:
>
> Hi there - any more vote/comment?
>
>
> 
> From: Luciano Resende 
> Sent: Friday, January 4, 2019 5:44 AM
> To: dev@crail.apache.org
> Subject: Re: [NOTICE] Mandatory migration of git repositories to 
> gitbox.apache.org
>
> Hey all, please use this thread if you have any concerns/questions
> about this move, otherwise please create a INFRA Jira issue and
> reference this thread.
>
> On Thu, Jan 3, 2019 at 8:19 AM Apache Infrastructure Team
>  wrote:
> >
> > Hello, crail folks.
> > As stated earlier in 2018, all git repositories must be migrated from
> > the git-wip-us.apache.org URL to gitbox.apache.org, as the old service
> > is being decommissioned. Your project is receiving this email because
> > you still have repositories on git-wip-us that needs to be migrated.
> >
> > The following repositories on git-wip-us belong to your project:
> > - incubator-crail.git
> > - incubator-crail-website.git
> >
> >
> > We are now entering the mandated (coordinated) move stage of the roadmap,
> > and you are asked to please coordinate migration with the Apache
> > Infrastructure Team before February 7th. All repositories not migrated
> > on February 7th will be mass migrated without warning, and we'd appreciate
> > it if we could work together to avoid a big mess that day :-).
> >
> > Moving to gitbox means you will get full write access on GitHub as well,
> > and be able to close/merge pull requests and much more.
> >
> > To have your repositories moved, please follow these steps:
> >
> > - Ensure consensus on the move (a link to a lists.apache.org thread will
> > suffice for us as evidence).
> > - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA
> >
> > Your migration should only take a few minutes. If you wish to migrate
> > at a specific time of day or date, please do let us know in the ticket.
> >
> > As always, we appreciate your understanding and patience as we move
> > things around and work to provide better services and features for
> > the Apache Family.
> >
> > Should you wish to contact us with feedback or questions, please do so
> > at: us...@infra.apache.org.
> >
> >
> > With regards,
> > Apache Infrastructure
> >
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
Awesome Shane!



From: shane knapp 
Sent: Sunday, January 6, 2019 11:38 AM
To: Felix Cheung
Cc: Dongjoon Hyun; Wenchen Fan; dev
Subject: Re: Spark Packaging Jenkins

noted.  i like the idea of building (but not signing) the release and will 
update the job(s) this week.

On Sun, Jan 6, 2019 at 11:22 AM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
https://spark.apache.org/release-process.html

Look for do-release-docker.sh script



From: Felix Cheung mailto:felixcheun...@hotmail.com>>
Sent: Sunday, January 6, 2019 11:17 AM
To: Dongjoon Hyun; Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

The release process doc should have been updated on this - as mentioned we do 
not use Jenkins for release signing (take this offline if further discussion is 
needed)

The release build on Jenkins can still be useful for pre-validating the release 
build process (without actually signing it)



From: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>
Sent: Saturday, January 5, 2019 9:46 PM
To: Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

Thank you, Wenchen.

I see. I'll update the doc and proceed to the next step manually as you advise. 
And it seems that we can stop the outdated Jenkins jobs, too.

Bests,
Dongjoon.

On Sat, Jan 5, 2019 at 20:15 Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
IIRC there was a change to the release process: we stop using the shared gpg 
key on Jenkins, but use the personal key of the release manager. I'm not sure 
Jenkins can help testing package anymore.

BTW release manager needs to run the packaging script by himself. If there is a 
problem, the release manager will find it out sooner or later.



On Sun, Jan 6, 2019 at 6:34 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

It turns out that `gpg signing` is the next huddle in Spark Packaging Jenkins.
Since 2.4.0 release, is there something changed in our Jenkins machine?

  gpg: skipped 
"/home/jenkins/workspace/spark-master-package/spark-utils/new-release-scripts/jenkins/jenkins-credentials-JEtz0nyn/gpg.tmp":
 No secret key
  gpg: signing failed: No secret key

Bests,
Dongjoon.


On Fri, Jan 4, 2019 at 11:52 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
https://issues.apache.org/jira/browse/SPARK-26537

On Fri, Jan 4, 2019 at 11:31 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
this may push in to early next week...  these builds were set up before my 
time, and i'm currently unraveling how they all work before pushing a commit to 
fix stuff.

nothing like some code archaeology to make my friday more exciting!  :)

shane

On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Thank you, Shane!

Bests,
Dongjoon.

On Fri, Jan 4, 2019 at 10:50 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
yeah, i'll get on that today.  thanks for the heads up.

On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All

As a part of release process, we need to check Packaging/Compile/Test Jenkins 
status.

http://spark.apache.org/release-process.html

1. Spark Packaging: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
2. Spark QA Compile: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
3. Spark QA Test: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/

Currently, (2) and (3) are working because it uses GitHub 
(https://github.com/apache/spark.git).
But, (1) seems to be broken because it's looking for old 
repo(https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of 
new GitBox.

Can we fix this in this week?

Bests,
Dongjoon.



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
https://spark.apache.org/release-process.html

Look for do-release-docker.sh script



From: Felix Cheung 
Sent: Sunday, January 6, 2019 11:17 AM
To: Dongjoon Hyun; Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

The release process doc should have been updated on this - as mentioned we do 
not use Jenkins for release signing (take this offline if further discussion is 
needed)

The release build on Jenkins can still be useful for pre-validating the release 
build process (without actually signing it)



From: Dongjoon Hyun 
Sent: Saturday, January 5, 2019 9:46 PM
To: Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

Thank you, Wenchen.

I see. I'll update the doc and proceed to the next step manually as you advise. 
And it seems that we can stop the outdated Jenkins jobs, too.

Bests,
Dongjoon.

On Sat, Jan 5, 2019 at 20:15 Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
IIRC there was a change to the release process: we stop using the shared gpg 
key on Jenkins, but use the personal key of the release manager. I'm not sure 
Jenkins can help testing package anymore.

BTW release manager needs to run the packaging script by himself. If there is a 
problem, the release manager will find it out sooner or later.



On Sun, Jan 6, 2019 at 6:34 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

It turns out that `gpg signing` is the next huddle in Spark Packaging Jenkins.
Since 2.4.0 release, is there something changed in our Jenkins machine?

  gpg: skipped 
"/home/jenkins/workspace/spark-master-package/spark-utils/new-release-scripts/jenkins/jenkins-credentials-JEtz0nyn/gpg.tmp":
 No secret key
  gpg: signing failed: No secret key

Bests,
Dongjoon.


On Fri, Jan 4, 2019 at 11:52 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
https://issues.apache.org/jira/browse/SPARK-26537

On Fri, Jan 4, 2019 at 11:31 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
this may push in to early next week...  these builds were set up before my 
time, and i'm currently unraveling how they all work before pushing a commit to 
fix stuff.

nothing like some code archaeology to make my friday more exciting!  :)

shane

On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Thank you, Shane!

Bests,
Dongjoon.

On Fri, Jan 4, 2019 at 10:50 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
yeah, i'll get on that today.  thanks for the heads up.

On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All

As a part of release process, we need to check Packaging/Compile/Test Jenkins 
status.

http://spark.apache.org/release-process.html

1. Spark Packaging: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
2. Spark QA Compile: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
3. Spark QA Test: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/

Currently, (2) and (3) are working because it uses GitHub 
(https://github.com/apache/spark.git).
But, (1) seems to be broken because it's looking for old 
repo(https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of 
new GitBox.

Can we fix this in this week?

Bests,
Dongjoon.



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
The release process doc should have been updated on this - as mentioned we do 
not use Jenkins for release signing (take this offline if further discussion is 
needed)

The release build on Jenkins can still be useful for pre-validating the release 
build process (without actually signing it)



From: Dongjoon Hyun 
Sent: Saturday, January 5, 2019 9:46 PM
To: Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

Thank you, Wenchen.

I see. I'll update the doc and proceed to the next step manually as you advise. 
And it seems that we can stop the outdated Jenkins jobs, too.

Bests,
Dongjoon.

On Sat, Jan 5, 2019 at 20:15 Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
IIRC there was a change to the release process: we stop using the shared gpg 
key on Jenkins, but use the personal key of the release manager. I'm not sure 
Jenkins can help testing package anymore.

BTW release manager needs to run the packaging script by himself. If there is a 
problem, the release manager will find it out sooner or later.



On Sun, Jan 6, 2019 at 6:34 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

It turns out that `gpg signing` is the next huddle in Spark Packaging Jenkins.
Since 2.4.0 release, is there something changed in our Jenkins machine?

  gpg: skipped 
"/home/jenkins/workspace/spark-master-package/spark-utils/new-release-scripts/jenkins/jenkins-credentials-JEtz0nyn/gpg.tmp":
 No secret key
  gpg: signing failed: No secret key

Bests,
Dongjoon.


On Fri, Jan 4, 2019 at 11:52 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
https://issues.apache.org/jira/browse/SPARK-26537

On Fri, Jan 4, 2019 at 11:31 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
this may push in to early next week...  these builds were set up before my 
time, and i'm currently unraveling how they all work before pushing a commit to 
fix stuff.

nothing like some code archaeology to make my friday more exciting!  :)

shane

On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Thank you, Shane!

Bests,
Dongjoon.

On Fri, Jan 4, 2019 at 10:50 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
yeah, i'll get on that today.  thanks for the heads up.

On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All

As a part of release process, we need to check Packaging/Compile/Test Jenkins 
status.

http://spark.apache.org/release-process.html

1. Spark Packaging: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
2. Spark QA Compile: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
3. Spark QA Test: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/

Currently, (2) and (3) are working because it uses GitHub 
(https://github.com/apache/spark.git).
But, (1) seems to be broken because it's looking for old 
repo(https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of 
new GitBox.

Can we fix this in this week?

Bests,
Dongjoon.



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org

2019-01-06 Thread Felix Cheung
Hi there - any more vote/comment?



From: Luciano Resende 
Sent: Friday, January 4, 2019 5:44 AM
To: dev@crail.apache.org
Subject: Re: [NOTICE] Mandatory migration of git repositories to 
gitbox.apache.org

Hey all, please use this thread if you have any concerns/questions
about this move, otherwise please create a INFRA Jira issue and
reference this thread.

On Thu, Jan 3, 2019 at 8:19 AM Apache Infrastructure Team
 wrote:
>
> Hello, crail folks.
> As stated earlier in 2018, all git repositories must be migrated from
> the git-wip-us.apache.org URL to gitbox.apache.org, as the old service
> is being decommissioned. Your project is receiving this email because
> you still have repositories on git-wip-us that needs to be migrated.
>
> The following repositories on git-wip-us belong to your project:
> - incubator-crail.git
> - incubator-crail-website.git
>
>
> We are now entering the mandated (coordinated) move stage of the roadmap,
> and you are asked to please coordinate migration with the Apache
> Infrastructure Team before February 7th. All repositories not migrated
> on February 7th will be mass migrated without warning, and we'd appreciate
> it if we could work together to avoid a big mess that day :-).
>
> Moving to gitbox means you will get full write access on GitHub as well,
> and be able to close/merge pull requests and much more.
>
> To have your repositories moved, please follow these steps:
>
> - Ensure consensus on the move (a link to a lists.apache.org thread will
> suffice for us as evidence).
> - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA
>
> Your migration should only take a few minutes. If you wish to migrate
> at a specific time of day or date, please do let us know in the ticket.
>
> As always, we appreciate your understanding and patience as we move
> things around and work to provide better services and features for
> the Apache Family.
>
> Should you wish to contact us with feedback or questions, please do so
> at: us...@infra.apache.org.
>
>
> With regards,
> Apache Infrastructure
>


--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org

2019-01-03 Thread Felix Cheung
Hi there - please see the notice below. I will kick this off as “discuss” thread

+1 for the move



From: Apache Infrastructure Team 
Sent: Thursday, January 3, 2019 5:19 AM
To: dev@crail.apache.org
Subject: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org

Hello, crail folks.
As stated earlier in 2018, all git repositories must be migrated from
the git-wip-us.apache.org URL to gitbox.apache.org, as the old service
is being decommissioned. Your project is receiving this email because
you still have repositories on git-wip-us that needs to be migrated.

The following repositories on git-wip-us belong to your project:
- incubator-crail.git
- incubator-crail-website.git


We are now entering the mandated (coordinated) move stage of the roadmap,
and you are asked to please coordinate migration with the Apache
Infrastructure Team before February 7th. All repositories not migrated
on February 7th will be mass migrated without warning, and we'd appreciate
it if we could work together to avoid a big mess that day :-).

Moving to gitbox means you will get full write access on GitHub as well,
and be able to close/merge pull requests and much more.

To have your repositories moved, please follow these steps:

- Ensure consensus on the move (a link to a lists.apache.org thread will
suffice for us as evidence).
- Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA

Your migration should only take a few minutes. If you wish to migrate
at a specific time of day or date, please do let us know in the ticket.

As always, we appreciate your understanding and patience as we move
things around and work to provide better services and features for
the Apache Family.

Should you wish to contact us with feedback or questions, please do so
at: us...@infra.apache.org.


With regards,
Apache Infrastructure



Re: Apache Spark 2.2.3 ?

2019-01-02 Thread Felix Cheung
+1 on 2.2.3 of course



From: Dongjoon Hyun 
Sent: Wednesday, January 2, 2019 12:21 PM
To: Saisai Shao
Cc: Xiao Li; Felix Cheung; Sean Owen; dev
Subject: Re: Apache Spark 2.2.3 ?

Thank you for swift feedbacks and Happy New Year. :)
For 2.2.3 release on next week, I see two positive opinions (including mine)
and don't see any direct objections.

Apache Spark has a mature, resourceful, and fast-growing community.
One of the important characteristic of the mature community is
the expectable behavior where the users are able to depend on.
For instance, we have a nice tradition to cut the branch as a sign of feature 
freeze.
The *final* release of a branch is not only good for the end users, but also a 
good sign of the EOL of the branch for all.

As a junior committer of the community, I want to contribute to deliver the 
final 2.2.3 release to the community and to finalize `branch-2.2`.

* For Apache Spark JIRA, I checked that there is no on-going issues targeting 
on `2.2.3`.
* For commits, I reviewed the newly landed commits after `2.2.2` tag and 
updated a few missing JIRA issues accordingly.
* Apparently, we can release 2.2.3 next week.

BTW, I'm +1 for the next 2.3/2.4 and have been expecting those releases before 
Spark+AI Summit (April) because we did like that usually.
Please send another email to `dev` mailing list because it's worth to receive 
more attentions and requests.

Bests,
Dongjoon.


On Tue, Jan 1, 2019 at 9:35 PM Saisai Shao 
mailto:sai.sai.s...@gmail.com>> wrote:
Agreed to have a new branch-2.3 release, as we already accumulated several 
fixes.

Thanks
Saisai

Xiao Li mailto:lix...@databricks.com>> 于2019年1月2日周三 
下午1:32写道:
Based on the commit history, 
https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.3
 contains more critical fixes. Maybe the priority is higher?

On Tue, Jan 1, 2019 at 9:22 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
Speaking of, it’s been 3 months since 2.3.2... (Sept 2018)

And 2 months since 2.4.0 (Nov 2018) - does the community feel 2.4 branch is 
stabilizing?



From: Sean Owen mailto:sro...@gmail.com>>
Sent: Tuesday, January 1, 2019 8:30 PM
To: Dongjoon Hyun
Cc: dev
Subject: Re: Apache Spark 2.2.3 ?

I agree with that logic, and if you're volunteering to do the legwork,
I don't see a reason not to cut a final 2.2 release.

On Tue, Jan 1, 2019 at 9:19 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
>
> Hi, All.
>
> Apache Spark community has a policy maintaining the feature branch for 18 
> months. I think it's time for the 2.2.3 release since 2.2.0 is released on 
> July 2017.
>
> http://spark.apache.org/versioning-policy.html
>
> After 2.2.2 (July 2018), `branch-2.2` has 40 patches (including security 
> patches).
>
> https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.2
>
> If it's okay and there is no further plan on `branch-2.2`, I want to 
> volunteer to prepare the first RC (early next week?).
>
> Please let me know your opinions about this.
>
> Bests,
> Dongjoon.

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>



--
[https://databricks.com/sparkaisummit/north-america?utm_source=email&utm_medium=signature]


Re: Apache Spark 2.2.3 ?

2019-01-01 Thread Felix Cheung
Speaking of, it’s been 3 months since 2.3.2... (Sept 2018)

And 2 months since 2.4.0 (Nov 2018) - does the community feel 2.4 branch is 
stabilizing?



From: Sean Owen 
Sent: Tuesday, January 1, 2019 8:30 PM
To: Dongjoon Hyun
Cc: dev
Subject: Re: Apache Spark 2.2.3 ?

I agree with that logic, and if you're volunteering to do the legwork,
I don't see a reason not to cut a final 2.2 release.

On Tue, Jan 1, 2019 at 9:19 PM Dongjoon Hyun  wrote:
>
> Hi, All.
>
> Apache Spark community has a policy maintaining the feature branch for 18 
> months. I think it's time for the 2.2.3 release since 2.2.0 is released on 
> July 2017.
>
> http://spark.apache.org/versioning-policy.html
>
> After 2.2.2 (July 2018), `branch-2.2` has 40 patches (including security 
> patches).
>
> https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.2
>
> If it's okay and there is no further plan on `branch-2.2`, I want to 
> volunteer to prepare the first RC (early next week?).
>
> Please let me know your opinions about this.
>
> Bests,
> Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Podling Report Reminder - January 2019

2018-12-28 Thread Felix Cheung
Thanks Subbu.

Is this the first report? If so, you can add: source repo migrated to Apache, 
mailing lists and JIRA created. Also a website created.

Otherwise LGTM



From: Subbu Subramaniam 
Sent: Friday, December 28, 2018 5:12 PM
To: g.kish...@gmail.com; dev@pinot.apache.org
Subject: Re: Podling Report Reminder - January 2019

Thanks Felix.

I have updated the podling page. Please take a look and let me know if there is 
anything else needed.

-Subbu

From: Felix Cheung 
Sent: Thursday, December 27, 2018 12:12 PM
To: dev@pinot.apache.org; g.kish...@gmail.com
Subject: Re: Podling Report Reminder - January 2019

Another reminder since we are less than one week away from the deadline.


From: Felix Cheung 
Sent: Friday, December 21, 2018 8:07 PM
To: dev@pinot.apache.org; g.kish...@gmail.com
Subject: Re: Podling Report Reminder - January 2019

Hi - quick reminder please see below.

Report is due Wed January 02 -- Podling reports due by end of day

Best to kick off discussions soon. Happy holidays!



From: jmcl...@apache.org
Sent: Friday, December 21, 2018 4:47 PM
To: d...@pinot.incubator.apache.org
Subject: Podling Report Reminder - January 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 16 January 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, January 02).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

* Your project name
* A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
* A list of the three most important issues to address in the move
towards graduation.
* Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
* How has the community developed since the last report
* How has the project developed since the last report.
* How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.apache.org%2Fincubator%2FJanuary2019&data=02%7C01%7Cssubramaniam%40linkedin.com%7C175bb3ae2a934ee8f6fe08d66c37afb6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636815383767120688&sdata=FKZKBqIUW5f%2FtdVaReuEdQmUOCrfDD4pCn4Ydqx7nvo%3D&reserved=0

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org



Pinot website and DOAP

2018-12-28 Thread Felix Cheung
Hello,

I’ve opened a PR to update some trademark references on the website
https://github.com/apache/incubator-pinot-site/pull/4

Also open JIRA on link back
https://issues.apache.org/jira/browse/PINOT-2

And JIRA on DOAP
https://issues.apache.org/jira/browse/PINOT-3

Regards



Re: Podling Report Reminder - January 2019

2018-12-27 Thread Felix Cheung
Another reminder since we are less than one week away from the deadline.


From: Felix Cheung 
Sent: Friday, December 21, 2018 8:07 PM
To: dev@pinot.apache.org; g.kish...@gmail.com
Subject: Re: Podling Report Reminder - January 2019

Hi - quick reminder please see below.

Report is due Wed January 02 -- Podling reports due by end of day

Best to kick off discussions soon. Happy holidays!



From: jmcl...@apache.org
Sent: Friday, December 21, 2018 4:47 PM
To: d...@pinot.incubator.apache.org
Subject: Podling Report Reminder - January 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 16 January 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, January 02).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

* Your project name
* A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
* A list of the three most important issues to address in the move
towards graduation.
* Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
* How has the community developed since the last report
* How has the project developed since the last report.
* How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/January2019

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org



Re: Podling Report Reminder - January 2019

2018-12-21 Thread Felix Cheung
Hi - quick reminder please see below.

Report is due Wed January 02 -- Podling reports due by end of day

Best to kick off discussions soon. Happy holidays!



From: jmcl...@apache.org
Sent: Friday, December 21, 2018 4:47 PM
To: d...@pinot.incubator.apache.org
Subject: Podling Report Reminder - January 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 16 January 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, January 02).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

* Your project name
* A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
* A list of the three most important issues to address in the move
towards graduation.
* Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
* How has the community developed since the last report
* How has the project developed since the last report.
* How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/January2019

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org



Re: Requested gitbox migration

2018-12-19 Thread Felix Cheung
Thanks!


From: Jongyoul Lee 
Sent: Wednesday, December 19, 2018 6:09:50 PM
To: dev
Subject: Requested gitbox migration

Please check the link below:
https://issues.apache.org/jira/browse/INFRA-17477

JL

--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Make interpreters' repository

2018-12-14 Thread Felix Cheung
Sure.



From: Jeff Zhang 
Sent: Thursday, December 13, 2018 11:29 PM
To: Jongyoul Lee
Cc: dev
Subject: Re: [DISCUSS] Make interpreters' repository

Thanks @Jongyoul Lee  , let's clean the interface
first.

Jongyoul Lee  于2018年12月14日周五 下午1:51写道:

> Thanks, Jeff and Felix,
>
> I simply thought it would be better to construct a more clear view for
> devs. But, as you mentioned, it could scatter our concentrations. Let's try
> to do efforts to clean up our code and test scenarios inside one repository.
>
> Thank you guys.
>
> On Fri, Dec 14, 2018 at 12:03 PM Jeff Zhang  wrote:
>
>> Agree with Felix, this is what I also concern in my last email. @Jongyoul
>> Lee  Could you explain more about how separate repo
>> help here ? Thanks
>>
>>
>> Felix Cheung  于2018年12月14日周五 上午11:00写道:
>>
>>> In my opinion, definitely a clean interface will be very useful and
>>> having a better way to test is good.
>>>
>>> But sounds to me like these should be possible without a code repo
>>> separation?
>>>
>>> The downside of separate repo (I assume still under ASF) is the
>>> spreading the attention of committers and contributors.
>>>
>>>
>>>
>>> 
>>> From: Jongyoul Lee 
>>> Sent: Tuesday, December 11, 2018 10:33 PM
>>> To: dev
>>> Subject: Re: [DISCUSS] Make interpreters' repository
>>>
>>> And for testing Z as well, we don't have to build Spark Interpreter again
>>> for integration tests. and even, we don't have to build zeppelin server
>>> and
>>> web for integration test for spark. We just use components built. I
>>> believe
>>> it makes our CI as well faster.
>>>
>>> On Wed, Dec 12, 2018 at 3:29 PM Jongyoul Lee  wrote:
>>>
>>> > Yes, right. BTW, I think we need to make dependencies clearly between
>>> > zeppelin-server and interpreters, and even among interpreters. Some
>>> > versions properties are used in zeppelin-server and interpreters, but
>>> no
>>> > one has any clear view for them. So I thought it would be a change
>>> when we
>>> > divide repositories. The second one is about building and compiling. We
>>> > don't have to build Zeppelin fully when building some components. We
>>> also
>>> > can do it with custom build options including '-pl !...'. I don't think
>>> > it's good and there's no reason to keep this kind of inconvenience.
>>> What do
>>> > you think?
>>> >
>>> > Regards,
>>> > JL
>>> >
>>> > On Wed, Dec 12, 2018 at 3:08 PM Jeff Zhang  wrote:
>>> >
>>> >> Hi Jongyoul,
>>> >>
>>> >> Thanks for bring this up. I don't understand how different repo will
>>> help
>>> >> here, but I thought about moving interpreters out of zeppelin for a
>>> long
>>> >> time, but don't have bandwidth for it. The release cycle of zeppelin
>>> core
>>> >> component (zeppelin-zengine, zeppelin-server) should not block the
>>> release
>>> >> of interpreter component (unless they depends on some features of
>>> >> zeppelin-zengine, zeppelin-server).
>>> >>
>>> >>
>>> >> Jongyoul Lee  于2018年12月12日周三 上午10:38写道:
>>> >>
>>> >> > Hi, dev and committers,
>>> >> >
>>> >> > Currently, I'm seeing the repositories of another apache projects.
>>> They
>>> >> > have some several repositories with different purposes. I'd like to
>>> >> suggest
>>> >> > you that we divide repositories between zeppelin-server and others.
>>> >> >
>>> >> > This will help you develop zeppelin-server without interfering from
>>> >> other
>>> >> > components and its dependencies. Even, in the case of interpreters,
>>> It
>>> >> will
>>> >> > provide more independent environments for developing interpreters
>>> >> > themselves. Currently, we had a lot of dependencies and various
>>> versions
>>> >> > for each interpreter.
>>> >> >
>>> >> > WDYT?
>>> >> >
>>> >> > Regards,
>>> >> > JL
>>> >> >
>>> >> > --
>>> >> > 이종열, Jongyoul Lee, 李宗烈
>>> >> > http://madeng.net
>>> >> >
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards
>>> >>
>>> >> Jeff Zhang
>>> >>
>>> >
>>> >
>>> > --
>>> > 이종열, Jongyoul Lee, 李宗烈
>>> > http://madeng.net
>>> >
>>>
>>>
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


--
Best Regards

Jeff Zhang


Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-14 Thread Felix Cheung
+1

Then let’s go ahead with this. I’m going to keep this open over the weekend and 
then open an INFRA ticket unless anyone has a concern.




From: Prabhjyot Singh 
Sent: Tuesday, December 11, 2018 9:22 AM
To: dev
Subject: Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git 
repositories on git-wip-us.apache.org

+1 for earlier.

On Tue, 11 Dec 2018 at 22:46, Felix Cheung 
wrote:

> Yes that’s all it takes to migrate.
>
> (And committers to setup gitbox link)
>
> Any more thoughts from the community?
>
>
> 
> From: Jongyoul Lee 
> Sent: Tuesday, December 11, 2018 1:38 AM
> To: dev
> Subject: Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git
> repositories on git-wip-us.apache.org
>
> We could create a ticket for the infra only, correct?
>
> On Mon, Dec 10, 2018 at 12:45 PM Jeff Zhang  wrote:
>
> > Definitely +1 for earlier, anyone volunteer for this ?
> >
> >
> > Jongyoul Lee  于2018年12月10日周一 上午11:34写道:
> >
> > > I don't think we have any special reason not to move there.
> > >
> > > +1 for earlier
> > >
> > > On Mon, Dec 10, 2018 at 3:56 AM Felix Cheung 
> > > wrote:
> > >
> > > > Hi community,
> > > >
> > > > The move to gitbox is coming. This does not affect Contributors -
> > mostly
> > > > how PR is merged. We could choose to voluntarily move early, or wait
> > till
> > > > later.
> > > >
> > > > So to discuss, should we move early?
> > > >
> > > >
> > > > -- Forwarded message -
> > > > From: Daniel Gruno 
> > > > Date: Fri, Dec 7, 2018 at 8:54 AM
> > > > Subject: [NOTICE] Mandatory relocation of Apache git repositories on
> > > > git-wip-us.apache.org
> > > > To: us...@infra.apache.org 
> > > >
> > > >
> > > > [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
> > > > DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
> > > >
> > > > Hello Apache projects,
> > > >
> > > > I am writing to you because you may have git repositories on the
> > > > git-wip-us server, which is slated to be decommissioned in the coming
> > > > months. All repositories will be moved to the new gitbox service
> which
> > > > includes direct write access on github as well as the standard ASF
> > > > commit access via gitbox.apache.org.
> > > >
> > > > ## Why this move? ##
> > > > The move comes as a result of retiring the git-wip service, as the
> > > > hardware it runs on is longing for retirement. In lieu of this, we
> > > > have decided to consolidate the two services (git-wip and gitbox), to
> > > > ease the management of our repository systems and future-proof the
> > > > underlying hardware. The move is fully automated, and ideally,
> nothing
> > > > will change in your workflow other than added features and access to
> > > > GitHub.
> > > >
> > > > ## Timeframe for relocation ##
> > > > Initially, we are asking that projects voluntarily request to move
> > > > their repositories to gitbox, hence this email. The voluntary
> > > > timeframe is between now and January 9th 2019, during which projects
> > > > are free to either move over to gitbox or stay put on git-wip. After
> > > > this phase, we will be requiring the remaining projects to move
> within
> > > > one month, after which we will move the remaining projects over.
> > > >
> > > > To have your project moved in this initial phase, you will need:
> > > >
> > > > - Consensus in the project (documented via the mailing list)
> > > > - File a JIRA ticket with INFRA to voluntarily move your project
> repos
> > > > over to gitbox (as stated, this is highly automated and will take
> > > > between a minute and an hour, depending on the size and number of
> > > > your repositories)
> > > >
> > > > To sum up the preliminary timeline;
> > > >
> > > > - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
> > > > relocation
> > > > - January 9th -> February 6th: Mandated (coordinated) relocation
> > > > - February 7th: All remaining repositories are mass migrated.
> > > >
> > > > This timel

Re: [DISCUSS] Moving to gitbox

2018-12-14 Thread Felix Cheung
I believe that’s what the earlier thread is for.



From: Jongyoul Lee 
Sent: Thursday, December 13, 2018 9:54 PM
To: dev
Subject: Re: [DISCUSS] Moving to gitbox

Yes, right.

That's because Infra suggests that we leave a mail thread to make a
consensus about it.


On Fri, Dec 14, 2018 at 12:14 PM Felix Cheung 
wrote:

> Hi Jongyoul - is this the same as the earlier thread?
>
>
> 
> From: Jongyoul Lee 
> Sent: Tuesday, December 11, 2018 6:28 PM
> To: dev
> Subject: [DISCUSS] Moving to gitbox
>
> Hi, devs,
>
> I'd like to make a consensus to move our repository from git-wip to gitbox.
>
> Please give your opinions with replies from this email.
>
> Thanks in advance,
> JL
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Moving to gitbox

2018-12-13 Thread Felix Cheung
Hi Jongyoul - is this the same as the earlier thread?



From: Jongyoul Lee 
Sent: Tuesday, December 11, 2018 6:28 PM
To: dev
Subject: [DISCUSS] Moving to gitbox

Hi, devs,

I'd like to make a consensus to move our repository from git-wip to gitbox.

Please give your opinions with replies from this email.

Thanks in advance,
JL

--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Make interpreters' repository

2018-12-13 Thread Felix Cheung
In my opinion, definitely a clean interface will be very useful and having a 
better way to test is good.

But sounds to me like these should be possible without a code repo separation?

The downside of separate repo (I assume still under ASF) is the spreading the 
attention of committers and contributors.




From: Jongyoul Lee 
Sent: Tuesday, December 11, 2018 10:33 PM
To: dev
Subject: Re: [DISCUSS] Make interpreters' repository

And for testing Z as well, we don't have to build Spark Interpreter again
for integration tests. and even, we don't have to build zeppelin server and
web for integration test for spark. We just use components built. I believe
it makes our CI as well faster.

On Wed, Dec 12, 2018 at 3:29 PM Jongyoul Lee  wrote:

> Yes, right. BTW, I think we need to make dependencies clearly between
> zeppelin-server and interpreters, and even among interpreters. Some
> versions properties are used in zeppelin-server and interpreters, but no
> one has any clear view for them. So I thought it would be a change when we
> divide repositories. The second one is about building and compiling. We
> don't have to build Zeppelin fully when building some components. We also
> can do it with custom build options including '-pl !...'. I don't think
> it's good and there's no reason to keep this kind of inconvenience. What do
> you think?
>
> Regards,
> JL
>
> On Wed, Dec 12, 2018 at 3:08 PM Jeff Zhang  wrote:
>
>> Hi Jongyoul,
>>
>> Thanks for bring this up. I don't understand how different repo will help
>> here, but I thought about moving interpreters out of zeppelin for a long
>> time, but don't have bandwidth for it. The release cycle of zeppelin core
>> component (zeppelin-zengine, zeppelin-server) should not block the release
>> of interpreter component (unless they depends on some features of
>> zeppelin-zengine, zeppelin-server).
>>
>>
>> Jongyoul Lee  于2018年12月12日周三 上午10:38写道:
>>
>> > Hi, dev and committers,
>> >
>> > Currently, I'm seeing the repositories of another apache projects. They
>> > have some several repositories with different purposes. I'd like to
>> suggest
>> > you that we divide repositories between zeppelin-server and others.
>> >
>> > This will help you develop zeppelin-server without interfering from
>> other
>> > components and its dependencies. Even, in the case of interpreters, It
>> will
>> > provide more independent environments for developing interpreters
>> > themselves. Currently, we had a lot of dependencies and various versions
>> > for each interpreter.
>> >
>> > WDYT?
>> >
>> > Regards,
>> > JL
>> >
>> > --
>> > 이종열, Jongyoul Lee, 李宗烈
>> > http://madeng.net
>> >
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-11 Thread Felix Cheung
Yes that’s all it takes to migrate.

(And committers to setup gitbox link)

Any more thoughts from the community?



From: Jongyoul Lee 
Sent: Tuesday, December 11, 2018 1:38 AM
To: dev
Subject: Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git 
repositories on git-wip-us.apache.org

We could create a ticket for the infra only, correct?

On Mon, Dec 10, 2018 at 12:45 PM Jeff Zhang  wrote:

> Definitely +1 for earlier, anyone volunteer for this ?
>
>
> Jongyoul Lee  于2018年12月10日周一 上午11:34写道:
>
> > I don't think we have any special reason not to move there.
> >
> > +1 for earlier
> >
> > On Mon, Dec 10, 2018 at 3:56 AM Felix Cheung 
> > wrote:
> >
> > > Hi community,
> > >
> > > The move to gitbox is coming. This does not affect Contributors -
> mostly
> > > how PR is merged. We could choose to voluntarily move early, or wait
> till
> > > later.
> > >
> > > So to discuss, should we move early?
> > >
> > >
> > > -- Forwarded message -
> > > From: Daniel Gruno 
> > > Date: Fri, Dec 7, 2018 at 8:54 AM
> > > Subject: [NOTICE] Mandatory relocation of Apache git repositories on
> > > git-wip-us.apache.org
> > > To: us...@infra.apache.org 
> > >
> > >
> > > [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
> > > DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
> > >
> > > Hello Apache projects,
> > >
> > > I am writing to you because you may have git repositories on the
> > > git-wip-us server, which is slated to be decommissioned in the coming
> > > months. All repositories will be moved to the new gitbox service which
> > > includes direct write access on github as well as the standard ASF
> > > commit access via gitbox.apache.org.
> > >
> > > ## Why this move? ##
> > > The move comes as a result of retiring the git-wip service, as the
> > > hardware it runs on is longing for retirement. In lieu of this, we
> > > have decided to consolidate the two services (git-wip and gitbox), to
> > > ease the management of our repository systems and future-proof the
> > > underlying hardware. The move is fully automated, and ideally, nothing
> > > will change in your workflow other than added features and access to
> > > GitHub.
> > >
> > > ## Timeframe for relocation ##
> > > Initially, we are asking that projects voluntarily request to move
> > > their repositories to gitbox, hence this email. The voluntary
> > > timeframe is between now and January 9th 2019, during which projects
> > > are free to either move over to gitbox or stay put on git-wip. After
> > > this phase, we will be requiring the remaining projects to move within
> > > one month, after which we will move the remaining projects over.
> > >
> > > To have your project moved in this initial phase, you will need:
> > >
> > > - Consensus in the project (documented via the mailing list)
> > > - File a JIRA ticket with INFRA to voluntarily move your project repos
> > > over to gitbox (as stated, this is highly automated and will take
> > > between a minute and an hour, depending on the size and number of
> > > your repositories)
> > >
> > > To sum up the preliminary timeline;
> > >
> > > - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
> > > relocation
> > > - January 9th -> February 6th: Mandated (coordinated) relocation
> > > - February 7th: All remaining repositories are mass migrated.
> > >
> > > This timeline may change to accommodate various scenarios.
> > >
> > > ## Using GitHub with ASF repositories ##
> > > When your project has moved, you are free to use either the ASF
> > > repository system (gitbox.apache.org) OR GitHub for your development
> > > and code pushes. To be able to use GitHub, please follow the primer
> > > at: https://reference.apache.org/committer/github
> > >
> > >
> > > We appreciate your understanding of this issue, and hope that your
> > > project can coordinate voluntarily moving your repositories in a
> > > timely manner.
> > >
> > > All settings, such as commit mail targets, issue linking, PR
> > > notification schemes etc will automatically be migrated to gitbox as
> > > well.
> > >
> > > With regards, Daniel on behalf of ASF Infra.
> > >
> > > PS:For inquiries, please reply to us...@infra.apache.org, not your
> > > project's dev list :-).
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> >
> >
> > --
> > 이종열, Jongyoul Lee, 李宗烈
> > http://madeng.net
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-10 Thread Felix Cheung
+1

On Mon, Dec 10, 2018 at 12:56 PM Roy Lenferink 
wrote:

> Hi all,
>
> The Apache Incubator is still having a repository on git-wip-us as well
> [1].
>
> Does anyone have a problem with moving over the incubator repository to
> gitbox voluntarily?
> This means integrated access and easy PRs (write access to the GitHub
> repo).
>
> We need to document support for the decision from a mailing list post, so
> here it is.
>
> - Roy
>
> [1] https://git-wip-us.apache.org/repos/asf/incubator.git
>
> -- Forwarded message -
> From: Daniel Gruno 
> Date: vr 7 dec. 2018 om 17:53
> Subject: [NOTICE] Mandatory relocation of Apache git repositories on
> git-wip-us.apache.org
> To: us...@infra.apache.org 
>
> [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
>   DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
>
> Hello Apache projects,
>
> I am writing to you because you may have git repositories on the
> git-wip-us server, which is slated to be decommissioned in the coming
> months. All repositories will be moved to the new gitbox service which
> includes direct write access on github as well as the standard ASF
> commit access via gitbox.apache.org.
>
> ## Why this move? ##
> The move comes as a result of retiring the git-wip service, as the
> hardware it runs on is longing for retirement. In lieu of this, we
> have decided to consolidate the two services (git-wip and gitbox), to
> ease the management of our repository systems and future-proof the
> underlying hardware. The move is fully automated, and ideally, nothing
> will change in your workflow other than added features and access to
> GitHub.
>
> ## Timeframe for relocation ##
> Initially, we are asking that projects voluntarily request to move
> their repositories to gitbox, hence this email. The voluntary
> timeframe is between now and January 9th 2019, during which projects
> are free to either move over to gitbox or stay put on git-wip. After
> this phase, we will be requiring the remaining projects to move within
> one month, after which we will move the remaining projects over.
>
> To have your project moved in this initial phase, you will need:
>
> - Consensus in the project (documented via the mailing list)
> - File a JIRA ticket with INFRA to voluntarily move your project repos
>over to gitbox (as stated, this is highly automated and will take
>between a minute and an hour, depending on the size and number of
>your repositories)
>
> To sum up the preliminary timeline;
>
> - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
>relocation
> - January 9th -> February 6th: Mandated (coordinated) relocation
> - February 7th: All remaining repositories are mass migrated.
>
> This timeline may change to accommodate various scenarios.
>
> ## Using GitHub with ASF repositories ##
> When your project has moved, you are free to use either the ASF
> repository system (gitbox.apache.org) OR GitHub for your development
> and code pushes. To be able to use GitHub, please follow the primer
> at: https://reference.apache.org/committer/github
>
>
> We appreciate your understanding of this issue, and hope that your
> project can coordinate voluntarily moving your repositories in a
> timely manner.
>
> All settings, such as commit mail targets, issue linking, PR
> notification schemes etc will automatically be migrated to gitbox as
> well.
>
> With regards, Daniel on behalf of ASF Infra.
>
> PS:For inquiries, please reply to us...@infra.apache.org, not your
> project's dev list :-).
>


Re: Questions about building the website of IoTDB Project

2018-12-09 Thread Felix Cheung
Looks good! Thanks for the mock ups.

On your homepage, references to Hadoop and Spark should be Apache Hadoop and 
Apache Spark, because of trademarks - and it might be good to link to their 
project pages.




From: Stefanie Zhao 
Sent: Friday, December 7, 2018 2:03 AM
To: 黄向东
Cc: dev@iotdb.apache.org; 徐毅
Subject: Re:Re: Questions about building the website of IoTDB Project

Dear all,


Prototype pages link are as follows, questions and suggestions are welcome.
Here are the links:
【homepage】https://s1.ax1x.com/2018/12/07/F3E1T1.png
【download page】https://s1.ax1x.com/2018/12/07/F3ElwR.png
【documentation page】https://s1.ax1x.com/2018/12/07/F3E8Fx.png
【tools page】https://s1.ax1x.com/2018/12/07/F3EJfK.png
【other page】https://s1.ax1x.com/2018/12/07/F3EQm9.png


Best, Xinyi
--

Xinyi Zhao(Stefanie)
School of Software, Tsinghua University
E-mail:stefanie_...@163.com


At 2018-12-07 11:01:56, "Xiangdong Huang"  wrote:

Hi,


Xinyi has finished the prototype pages.


Yi Xu, can you build the website with several classmates?


You can follow some other apache project websites.


Notice that the runtime is Apache HTTP server. Besides, I noticed that many 
existing projects use Jekyll. You can have a try.


Xinyi, can you attach the prototype pages here as a JPEG? (I am not sure that 
whether mail list supports figures..)


Best,


---
Xiangdong Huang
School of Software, Tsinghua University


黄向东
清华大学 软件学院




Christofer Dutz  于2018年11月30日周五 下午8:07写道:

Hi all,

yeah ... bin there ... done that ... guilty as charged.

Well in general Apache websites are simple static HTML/JS/CSS content served by 
a really fat Apache HTTPD server.

However you don't copy stuff to the webserver directly. As mentioned before you 
usually setup a detached branch in a code repo named "asf-site" and then ask 
infra to sync that with your projects website.

In the Apache PLC4X (incubating) project we generate the website as part of the 
maven build. Everything is automatically pushed to a dedicated website git 
repos asf-site branch.

However this automatic push has to be executed on Jenkins nodes tagged with 
"git-websites" as only these have the credentials to automatically push to ASF 
code repos.

I will gladly help you get setup as I know this can be quite a pain if you 
don't know all the little details.


Chris



Am 30.11.18, 04:34 schrieb "Justin Mclean" :

HI,

And created for you [1]. Chris (one of the other mentors) has experience 
creating apache websites from scratch and may have some advice to offer.

Thanks,
Justin

1. https://github.com/apache/incubator-iotdb-website



[DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-09 Thread Felix Cheung
Hi community,

The move to gitbox is coming. This does not affect Contributors - mostly
how PR is merged. We could choose to voluntarily move early, or wait till
later.

So to discuss, should we move early?


-- Forwarded message -
From: Daniel Gruno 
Date: Fri, Dec 7, 2018 at 8:54 AM
Subject: [NOTICE] Mandatory relocation of Apache git repositories on
git-wip-us.apache.org
To: us...@infra.apache.org 


[IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
  DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]

Hello Apache projects,

I am writing to you because you may have git repositories on the
git-wip-us server, which is slated to be decommissioned in the coming
months. All repositories will be moved to the new gitbox service which
includes direct write access on github as well as the standard ASF
commit access via gitbox.apache.org.

## Why this move? ##
The move comes as a result of retiring the git-wip service, as the
hardware it runs on is longing for retirement. In lieu of this, we
have decided to consolidate the two services (git-wip and gitbox), to
ease the management of our repository systems and future-proof the
underlying hardware. The move is fully automated, and ideally, nothing
will change in your workflow other than added features and access to
GitHub.

## Timeframe for relocation ##
Initially, we are asking that projects voluntarily request to move
their repositories to gitbox, hence this email. The voluntary
timeframe is between now and January 9th 2019, during which projects
are free to either move over to gitbox or stay put on git-wip. After
this phase, we will be requiring the remaining projects to move within
one month, after which we will move the remaining projects over.

To have your project moved in this initial phase, you will need:

- Consensus in the project (documented via the mailing list)
- File a JIRA ticket with INFRA to voluntarily move your project repos
   over to gitbox (as stated, this is highly automated and will take
   between a minute and an hour, depending on the size and number of
   your repositories)

To sum up the preliminary timeline;

- December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
   relocation
- January 9th -> February 6th: Mandated (coordinated) relocation
- February 7th: All remaining repositories are mass migrated.

This timeline may change to accommodate various scenarios.

## Using GitHub with ASF repositories ##
When your project has moved, you are free to use either the ASF
repository system (gitbox.apache.org) OR GitHub for your development
and code pushes. To be able to use GitHub, please follow the primer
at: https://reference.apache.org/committer/github


We appreciate your understanding of this issue, and hope that your
project can coordinate voluntarily moving your repositories in a
timely manner.

All settings, such as commit mail targets, issue linking, PR
notification schemes etc will automatically be migrated to gitbox as
well.

With regards, Daniel on behalf of ASF Infra.

PS:For inquiries, please reply to us...@infra.apache.org, not your
project's dev list :-).



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


Re: Incubator Podling Report (Due 5th December)

2018-12-05 Thread Felix Cheung
Hi - just the quick note that the report is due today.

Justin has sent a separate reminder. It’s important to put some quick status on 
this.

For example, from what I know:

- source repo migrated
- email list setup - dev@ traffic is a bit light
- website in progress
- no change to committer & PPMC
- ? contributor growth

If dev@ is ok with this summary I can also write this into the wiki page as the 
project report.



From: Felix Cheung 
Sent: Monday, December 3, 2018 9:58 AM
To: dev@pinot.apache.org; kishore g; dev@pinot.apache.org; jmcl...@apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

Process info here
https://incubator.apache.org/guides/ppmc.html#podling_status_reports

It’s a wiki page
https://wiki.apache.org/incubator/December2018



From: Subbu Subramaniam 
Sent: Monday, December 3, 2018 9:39 AM
To: kishore g; dev@pinot.apache.org; jmcl...@apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

What is involved here?

Having never done one before, I have no idea. Is there a template for the 
report?

thanks

-Subbu

From: kishore g 
Sent: Sunday, December 2, 2018 2:57 PM
To: dev@pinot.apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

Subbu, do you want to take a stab at this?

On Sun, Dec 2, 2018 at 12:30 PM Justin Mclean  wrote:

> Hi,
>
> The incubator PMC would appreciated if you could complete the podling
> report on time it's due on 5th December in a few days. It takes time to
> prepare the incubator report, have your mentors sign off the report and for
> the board to review it, so it's best if you can get it in early.
>
> Thanks,
> Justin
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> For additional commands, e-mail: dev-h...@pinot.apache.org
>
>


Re: draft podling report

2018-12-05 Thread Felix Cheung
Great! I’ve added myself preemptively to the wiki and signed off the Crail 
report.

Could one you please add me on whimsy, if it makes sense, as mentor. Thanks



From: Luciano Resende 
Sent: Tuesday, December 4, 2018 2:01 PM
To: dev@crail.apache.org
Subject: Re: draft podling report

+1, have already signed off.

On Tue, Dec 4, 2018 at 9:30 AM Julian Hyde  wrote:
>
> Looks good. Thanks for doing this promptly. I'll sign off tomorrow
> after it is final.
>
> Felix, As a new mentor, feel free to add your name below the report,
> sign off, and add comments.
>
> Julian
>
> On Tue, Dec 4, 2018 at 9:18 AM bernard metzler  wrote:
> >
> > Dear all, I added a draft podling report for Crail to
> > https://wiki.apache.org/incubator/December2018. Please
> > check if it makes all sense. Dear mentors, as usual,
> > be frank if something looks suspicious or is inappropriate,
> > or missing. I think we can still edit until tomorrow
> > evening.
> >
> > Thanks a lot,
> > Bernard.



--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Incubator Wiki write access

2018-12-04 Thread Felix Cheung
It might have lost my earlier account.

My username is FelixCheung

Thanks


Re: [VOTE] Apache Crail 1.1-incubating (rc8)

2018-12-03 Thread Felix Cheung
Thanks for getting back to me Jonas.

Maybe I didn’t get the change in CRAIL-74. btw some of the JIRAs do not
have links to github PR?

Rat- yes thanks I think I was looking at a older change. I double check and
see the Pom file rat exclusion does not include docker or doc


On Mon, Dec 3, 2018 at 12:29 AM Jonas Pfefferle  wrote:

> Hi Felix
>
>
>   On Fri, 30 Nov 2018 15:43:45 -0800
>   Felix Cheung  wrote:
> > +1 (binding)
> >
> > a few comments below, checked:
> > filename
> > signature & hash
> > DISCLAIMER, LICENSE, NOTICE
> > build from src
> > no binary
> > src files have headers (see below)
> >
> > comments, not blocker for release IMO:
> > 1.
> > CREDITS file is a bit non-standard in an ASF release - this is
> >generally
> > not included as it is already captured in git history and SGA
>
> The CREDITS was introduced for the past IBM copyright notice:
> https://jira.apache.org/jira/projects/CRAIL/issues/CRAIL-33
>
> >
> > 2.
> > https://jira.apache.org/jira/projects/CRAIL/issues/CRAIL-74 is
> >marked as
> >Fixed but I don't see a change in the -bin tarball?
>
> At least on my machine the binary tarball now has a toplevel directory.
> Can
> someone else confirm?
>
> >
> > 3.
> > licenses/ directory do not need to include those from ASF and on
> >Apache v2
> > license, eg.
> > apache-crail-1.1-incubating/licenses $ grep -e "Apache" *
> > LICENSE.commons-logging.txt: Apache License
> > LICENSE.commons-math3-3.1.1: Apache License
>
> Makes sense, we will remove them on the next release.
>
> >
> > 4.
> > Doc mentions Libdisni is a requirement - it might help to list the
> > supported/tested releases of Libdisni
>
> I agree, the requirements for building/running Crail need to be fixed.
> What you need very much depends on which datatiers you want to run:
> https://jira.apache.org/jira/projects/CRAIL/issues/CRAIL-68
>
> >
> > 5.
> > ASF header - docker/* and doc/* and conf/* can also have ASF header
> >as
> > comment block - consider adding that
>
> docker/* and doc/* do have have ASF headers, the only thing excluded are
> conf/*, credits and licenses.
> Not sure what the point is of putting ASF headers in configuration file
> templates. I have checked multiple other projects and none had any.
>
>
> Thanks,
> Jonas
>
> >
> >
> > On Thu, Nov 29, 2018 at 6:50 AM Adrian Schuepbach
> >
> > wrote:
> >
> >> Hi all
> >>
> >> Please vote to approve the release of Apache Crail 1.1-incubating
> >>(rc8).
> >>
> >> The podling dev vote thread:
> >>
> >> https://www.mail-archive.com/dev@crail.apache.org/msg00519.html
> >>
> >> The result:
> >>
> >> https://www.mail-archive.com/dev@crail.apache.org/msg00526.html
> >>
> >> Commit hash: 08c75b55f7f97be869049cf80a0da5347e550a3d
> >>
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-crail.git;a=commit;h=08c75b55f7f97be869049cf80a0da5347e550a3d
> >>
> >>
> >> Release files can be found at:
> >> https://dist.apache.org/repos/dist/dev/incubator/crail/1.1-rc8/
> >>
> >> The Nexus Staging URL:
> >> https://repository.apache.org/content/repositories/orgapachecrail-1007/
> >>
> >> Release artifacts are signed with the following key:
> >> https://www.apache.org/dist/incubator/crail/KEYS
> >>
> >> For information about the contents of this release, see:
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-crail.git;a=blob_plain;f=HISTORY.md;hb=08c75b55f7f97be869049cf80a0da5347e550a3d
> >> or
> >>
> >>
> https://github.com/apache/incubator-crail/blob/08c75b55f7f97be869049cf80a0da5347e550a3d/HISTORY.md
> >>
> >> The vote is open for at least 72 hours and passes if a majority of
> >>at
> >> least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Crail 1.1-incubating
> >> [ ] -1 Do not release this package because ...
> >>
> >> Thanks,
> >> Adrian
> >>
> >>
>
>


Re: Podling website

2018-12-03 Thread Felix Cheung
Great!


From: Seunghyun Lee 
Sent: Monday, December 3, 2018 11:26:15 AM
To: dev@pinot.apache.org
Subject: Re: Podling website

Hi Felix,


We are planning to build the incubator website for Pinot soon. We will update 
here once they are ready.


Best,

Seunghyun


From: Felix Cheung 
Sent: Sunday, December 2, 2018 3:08:29 PM
To: dev@pinot.apache.org
Subject: Podling website

Hi,

Want to see if there is any effort on building the incubator website. I’ve 
checked github issue and mail archives.





Re: Incubator Podling Report (Due 5th December)

2018-12-03 Thread Felix Cheung
Process info here
https://incubator.apache.org/guides/ppmc.html#podling_status_reports

It’s a wiki page
https://wiki.apache.org/incubator/December2018



From: Subbu Subramaniam 
Sent: Monday, December 3, 2018 9:39 AM
To: kishore g; dev@pinot.apache.org; jmcl...@apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

What is involved here?

Having never done one before, I have no idea. Is there a template for the 
report?

thanks

-Subbu

From: kishore g 
Sent: Sunday, December 2, 2018 2:57 PM
To: dev@pinot.apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

Subbu, do you want to take a stab at this?

On Sun, Dec 2, 2018 at 12:30 PM Justin Mclean  wrote:

> Hi,
>
> The incubator PMC would appreciated if you could complete the podling
> report on time it's due on 5th December in a few days. It takes time to
> prepare the incubator report, have your mentors sign off the report and for
> the board to review it, so it's best if you can get it in early.
>
> Thanks,
> Justin
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> For additional commands, e-mail: dev-h...@pinot.apache.org
>
>


[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Description: 
This ticket tracks an SPIP to improve model load time and model serving 
interfaces for online serving of Spark MLlib models.  The SPIP is here

[https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]

 

The improvement opportunity exists in all versions of spark.  We developed our 
set of changes wrt version 2.1.0 and can port them forward to other versions 
(e.g., we have ported them forward to 2.3.2).

  was:
This ticket tracks an SPIP to improve model load time and model serving 
interfaces for online serving of Spark MLlib models.  The SPIP is here

[https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]

 

The improvement opportunity exists in all versions of spark.  We developed our 
set of changes wrt version 2.1.0 and can port them forward to other versions 
(e.g., wehave ported them forward to 2.3.2).


> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., we have ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Target Version/s: 3.0.0  (was: 2.1.0)

> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., wehave ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Fix Version/s: (was: 2.1.0)

> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., wehave ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Podling website

2018-12-02 Thread Felix Cheung
Hi,

Want to see if there is any effort on building the incubator website. I’ve 
checked github issue and mail archives.





[jira] [Resolved] (SPARK-26189) Fix the doc of unionAll in SparkR

2018-11-30 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26189.
--
  Resolution: Fixed
Assignee: Huaxin Gao
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0

> Fix the doc of unionAll in SparkR
> -
>
> Key: SPARK-26189
> URL: https://issues.apache.org/jira/browse/SPARK-26189
> Project: Spark
>  Issue Type: Documentation
>  Components: R
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.0.0
>
>
> We should fix the doc of unionAll in SparkR. See the discussion: 
> https://github.com/apache/spark/pull/23131/files#r236760822



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Accept donation of Rust Parquet implementation

2018-11-30 Thread Felix Cheung
+1!!


From: Andy Grove 
Sent: Friday, November 30, 2018 4:26:21 PM
To: dev@arrow.apache.org
Subject: Re: [VOTE] Accept donation of Rust Parquet implementation

+1 and great to see this happening!

On Fri, Nov 30, 2018 at 4:51 PM Wes McKinney  wrote:

> Dear all,
>
> The developers of
>
> https://github.com/sunchao/parquet-rs
>
> have been in touch with Apache Arrow and Apache Parquet. Based on
> mailing list discussions, it is being proposed to donate this Parquet
> Rust implementation into the Apache Arrow codebase in order to develop
> it together with the Arrow Rust implementation, for similar reasons
> that we are co-developing the Arrow and Parquet C++ implementations.
> Some of the same developers are involved in both projects.
>
> See https://github.com/apache/arrow/pull/3050
>
> It remains to be determined if the Parquet PMC will choose to make
> separate Rust Parquet releases. This action does not close off that
> path, and is more about helping the Rust developers to be productive
> working on these projects.
>
> Parquet had previously voted to accept the code donation, but the
> effort stalled out earlier this year (in part due to my lack of
> bandwidth, since I had been offering to help steward the donation
> there).
>
> This vote is to determine if the Arrow PMC is in favor of accepting
> this donation. If the vote passes, the PMC and the authors of the code
> will work together to complete the ASF IP Clearance process
> (http://incubator.apache.org/ip-clearance/) and import the Rust
> implementation into the Arrow codebase.
>
> [ ] +1 : Accept contribution of Rust implementation
> [ ]  0 : No opinion
> [ ] -1 : Reject contribution because...
>
> Here is my vote: +1
>
> The vote will be open for at least 72 hours.
>
> Thanks,
> Wes
>


Re: [VOTE] Apache Crail 1.1-incubating (rc8)

2018-11-30 Thread Felix Cheung
+1 (binding)

a few comments below, checked:
filename
signature & hash
DISCLAIMER, LICENSE, NOTICE
build from src
no binary
src files have headers (see below)

comments, not blocker for release IMO:
1.
CREDITS file is a bit non-standard in an ASF release - this is generally
not included as it is already captured in git history and SGA

2.
https://jira.apache.org/jira/projects/CRAIL/issues/CRAIL-74 is marked as
Fixed but I don't see a change in the -bin tarball?

3.
licenses/ directory do not need to include those from ASF and on Apache v2
license, eg.
apache-crail-1.1-incubating/licenses $ grep -e "Apache" *
LICENSE.commons-logging.txt: Apache License
LICENSE.commons-math3-3.1.1: Apache License

4.
Doc mentions Libdisni is a requirement - it might help to list the
supported/tested releases of Libdisni

5.
ASF header - docker/* and doc/* and conf/* can also have ASF header as
comment block - consider adding that


On Thu, Nov 29, 2018 at 6:50 AM Adrian Schuepbach 
wrote:

> Hi all
>
> Please vote to approve the release of Apache Crail 1.1-incubating (rc8).
>
> The podling dev vote thread:
>
> https://www.mail-archive.com/dev@crail.apache.org/msg00519.html
>
> The result:
>
> https://www.mail-archive.com/dev@crail.apache.org/msg00526.html
>
> Commit hash: 08c75b55f7f97be869049cf80a0da5347e550a3d
>
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-crail.git;a=commit;h=08c75b55f7f97be869049cf80a0da5347e550a3d
>
>
> Release files can be found at:
> https://dist.apache.org/repos/dist/dev/incubator/crail/1.1-rc8/
>
> The Nexus Staging URL:
> https://repository.apache.org/content/repositories/orgapachecrail-1007/
>
> Release artifacts are signed with the following key:
> https://www.apache.org/dist/incubator/crail/KEYS
>
> For information about the contents of this release, see:
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-crail.git;a=blob_plain;f=HISTORY.md;hb=08c75b55f7f97be869049cf80a0da5347e550a3d
> or
>
> https://github.com/apache/incubator-crail/blob/08c75b55f7f97be869049cf80a0da5347e550a3d/HISTORY.md
>
> The vote is open for at least 72 hours and passes if a majority of at
> least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Crail 1.1-incubating
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Adrian
>
>
>


[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API

2018-11-28 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701505#comment-16701505
 ] 

Felix Cheung commented on SPARK-21291:
--

hmm, ok

> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API

2018-11-27 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700743#comment-16700743
 ] 

Felix Cheung commented on SPARK-21291:
--

I think we need to reopen this Jira since bucketBy is not addressed.

> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Accept the Iceberg project for incubation

2018-11-13 Thread Felix Cheung
+1 (non binding)

awesome to see this is taken forward to the incubator and looking forward
to collaborate with the community!


On Tue, Nov 13, 2018 at 9:09 AM Ryan Blue  wrote:

> +1 (binding)
>
> On Tue, Nov 13, 2018 at 9:06 AM Ryan Blue  wrote:
>
> > The discuss thread seems to have reached consensus, so I propose
> accepting
> > the Iceberg project for incubation.
> >
> > The proposal is copied below and in the wiki:
> > https://wiki.apache.org/incubator/IcebergProposal
> >
> > Please vote on whether to accept Iceberg in the next 72 hours:
> >
> > [ ] +1, accept Iceberg for incubation
> > [ ] -1, reject the Iceberg proposal because . . .
> >
> > Thank you for reviewing the proposal and voting,
> >
> > rb
> > --
> > Iceberg Proposal Abstract
> >
> > Iceberg is a table format for large, slow-moving tabular data.
> >
> > It is designed to improve on the de-facto standard table layout built
> into
> > Apache Hive, Presto, and Apache Spark.
> > Proposal
> >
> > The purpose of Iceberg is to provide SQL-like tables that are backed by
> > large sets of data files. Iceberg is similar to the Hive table layout,
> the
> > de-facto standard structure used to track files in a table, but provides
> > additional guarantees and performance optimizations:
> >
> >- Atomicity - Each change to the table is will be complete or will
> >fail. “Do or do not. There is no try.”
> >- Snapshot isolation - Reads use one and only one snapshot of a table
> >at some time without holding a lock.
> >- Safe schema evolution - A table’s schema can change in well-defined
> >ways, without breaking older data files.
> >- Column projection - An engine may request a subset of the available
> >columns, including nested fields.
> >- Predicate pushdown - An engine can push filters into read planning
> >to improve performance using partition data and file-level statistics.
> >
> > Iceberg does NOT define a new file format. All data is stored in Apache
> > Avro, Apache ORC, or Apache Parquet files.
> >
> > Additionally, Iceberg is designed to work well when data files are stored
> > in cloud blob stores, even when those systems provide weaker guarantees
> > than a file system, including:
> >
> >- Eventual consistency in the namespace
> >- High latency for directory listings
> >- No renames of objects
> >- No folder hierarchy
> >
> > Rationale
> >
> > Initial benchmarks show dramatic improvements in query planning. For
> > example, in Netflix’s Atlas use case, which stores time-series metrics
> from
> > Netflix runtime systems and 1 month is stored across 2.7 million files in
> > 2,688 partitions:
> >
> >- Hive table using Parquet:
> >   - 400k+ splits, not combined
> >   - Explain query: 9.6 minutes wall time (planning only)
> >- Iceberg table with partition filtering:
> >   - 15,218 splits, combined
> >   - Planning: 10 seconds
> >   - Query wall time: 13 minutes
> >- Iceberg table with partition and min/max filtering:
> >   - 412 splits
> >   - Planning: 25 seconds
> >   - Query wall time: 42 seconds
> >
> > These performance gains combined with the cross-engine compatibility are
> a
> > very compelling story.
> > Initial Goals
> >
> > The initial goal will be to move the existing codebase to Apache and
> > integrate with the Apache development process and infrastructure. A
> primary
> > goal of incubation will be to grow and diversify the Iceberg community.
> We
> > are well aware that the project community is largely comprised of
> > individuals from a single company. We aim to change that during
> incubation.
> > Current Status
> >
> > As previously mentioned, Iceberg is under active development at Netflix,
> > and is being used in processing large volumes of data in Amazon EC2.
> >
> > Iceberg license documentation is already based on Apache guidelines for
> > LICENSE and NOTICE content.
> > Meritocracy
> >
> > We value meritocracy and we understand that it is the basis for an open
> > community that encourages multiple companies and individuals to
> contribute
> > and be invested in the project’s future. We will encourage and monitor
> > participation and make sure to extend privileges and responsibilities to
> > all contributors.
> > Community
> >
> > Iceberg is currently being used by developers at Netflix and a growing
> > number of users are actively using it in production environments. Iceberg
> > has received contributions from developers working at Hortonworks,
> WeWork,
> > and Palantir. By bringing Iceberg to Apache we aim to assure current and
> > future contributors that the Iceberg community is meritocratic and open,
> in
> > order to broaden and diversity the user and developer community.
> > Core Developers
> >
> > Iceberg was initially developed at Netflix and is under active
> > development. We believe Netflix will be of interest to a broad range of
> > users and developers and that incubating the p

[jira] [Commented] (SPARK-24255) Require Java 8 in SparkR description

2018-11-12 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684668#comment-16684668
 ] 

Felix Cheung commented on SPARK-24255:
--

[~shivaram] I'm thinking if this is handling all version cases?

*[kiszk|https://github.com/kiszk]*  found this with java version
{code:java}
$ ../OpenJDK-8/java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)

$ ../OpenJDK-8/java Version
jave.specification.version=1.8
jave.version=1.8.0_162
jave.version.split(".")[0]=1

$ ../OpenJDK-9/java -version
openjdk version "9"
OpenJDK Runtime Environment (build 9+181)
OpenJDK 64-Bit Server VM (build 9+181, mixed mode)

$ ../OpenJDK-9/java Version
jave.specification.version=9
jave.version=9
jave.version.split(".")[0]=9

$ ../OpenJDK-11/java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

$ ../OpenJDK-11/java Version
jave.specification.version=11
jave.version=11.0.1
jave.version.split(".")[0]=11


$ ../OpenJ9-8/java -version
openjdk version "1.8.0_192"
OpenJDK Runtime Environment (build 1.8.0_192-b12)
Eclipse OpenJ9 VM (build openj9-0.11.0, JRE 1.8.0 Windows 10 amd64-64-Bit 
Compressed References 20181019_105 (JIT enabled, AOT enabled)
OpenJ9   - 090ff9dc
OMR  - ea548a66
JCL  - 51609250b5 based on jdk8u192-b12)

$ ../OpenJ9-8/java Version
jave.specification.version=1.8
jave.version=1.8.0_192
jave.version.split(".")[0]=1

$ ../OpenJ9-9/java -version
openjdk version "9.0.4-adoptopenjdk"
OpenJDK Runtime Environment (build 9.0.4-adoptopenjdk+12)
Eclipse OpenJ9 VM (build openj9-0.9.0, JRE 9 Windows 8.1 amd64-64-Bit 
Compressed References 20180814_161 (JIT enabled, AOT enabled)
OpenJ9   - 24e53631
OMR  - fad6bf6e
JCL  - feec4d2ae based on jdk-9.0.4+12)

$ ../OpenJ9-9/java Version
jave.specification.version=9
jave.version=9.0.4-adoptopenjdk
jave.version.split(".")[0]=9


$ ../OpenJ9-11/java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.1+13)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.11.0, JRE 11 Windows 10 
amd64-64-Bit Compressed References 20181020_83 (JIT enabled, AOT enabled)
OpenJ9   - 090ff9dc
OMR  - ea548a66
JCL  - f62696f378 based on jdk-11.0.1+13)

$ ../OpenJ9-11/java Version
jave.specification.version=11
jave.version=11.0.1
jave.version.split(".")[0]=11


$ ../IBMJDK-8/java -version
java version "1.8.0"
Java(TM) SE Runtime Environment (build pwa6480-20150129_02)
IBM J9 VM (build 2.8, JRE 1.8.0 Windows 8.1 amd64-64 Compressed References 
20150116_231420 (JIT enabled, AOT enabled)
J9VM - R28_Java8_GA_20150116_2030_B231420
JIT  - tr.r14.java_20150109_82886.02
GC   - R28_Java8_GA_20150116_2030_B231420_CMPRSS
J9CL - 20150116_231420)
JCL - 20150123_01 based on Oracle jdk8u31-b12

$ ../IBMJDK-8/java Version
jave.specification.version=1.8
jave.version=1.8.0
jave.version.split(".")[0]=1
{code}

> Require Java 8 in SparkR description
> 
>
> Key: SPARK-24255
> URL: https://issues.apache.org/jira/browse/SPARK-24255
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> CRAN checks require that the Java version be set both in package description 
> and checked during runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26010.
--
   Resolution: Fixed
 Assignee: Felix Cheung
Fix Version/s: 3.0.0
   2.4.1

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-11 Thread Felix Cheung
I opened a PR on the vignettes fix to skip eval.



From: Shivaram Venkataraman 
Sent: Wednesday, November 7, 2018 7:26 AM
To: Felix Cheung
Cc: Sean Owen; Shivaram Venkataraman; Wenchen Fan; Matei Zaharia; dev
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

Agree with the points Felix made.

One thing is that it looks like the only problem is vignettes and the
tests are being skipped as designed. If you see
https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Windows/00check.log
and 
https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Debian/00check.log,
the tests run in 1s.
On Tue, Nov 6, 2018 at 1:29 PM Felix Cheung  wrote:
>
> I’d rather not mess with 2.4.0 at this point. On CRAN is nice but users can 
> also install from Apache Mirror.
>
> Also I had attempted and failed to get vignettes not to build, it was non 
> trivial and could t get it to work. It I have an idea.
>
> As for tests I don’t know exact why is it not skipped. Need to investigate 
> but worse case test_package can run with 0 test.
>
>
>
> 
> From: Sean Owen 
> Sent: Tuesday, November 6, 2018 10:51 AM
> To: Shivaram Venkataraman
> Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> I think the second option, to skip the tests, is best right now, if
> the alternative is to have no SparkR release at all!
> Can we monkey-patch the 2.4.0 release for SparkR in this way, bless it
> from the PMC, and release that? It's drastic but so is not being able
> to release, I think.
> Right? or is CRAN not actually an important distribution path for
> SparkR in particular?
>
> On Tue, Nov 6, 2018 at 12:49 PM Shivaram Venkataraman
>  wrote:
> >
> > Right - I think we should move on with 2.4.0.
> >
> > In terms of what can be done to avoid this error there are two strategies
> > - Felix had this other thread about JDK 11 that should at least let
> > Spark run on the CRAN instance. In general this strategy isn't
> > foolproof because the JDK version and other dependencies on that
> > machine keep changing over time and we dont have much control over it.
> > Worse we also dont have much control
> > - The other solution is to not run code to build the vignettes
> > document and just have static code blocks there that have been
> > pre-evaluated / pre-populated. We can open a JIRA to discuss the
> > pros/cons of this ?
> >
> > Thanks
> > Shivaram
> >
> > On Tue, Nov 6, 2018 at 10:57 AM Felix Cheung  
> > wrote:
> > >
> > > We have not been able to publish to CRAN for quite some time (since 2.3.0 
> > > was archived - the cause is Java 11)
> > >
> > > I think it’s ok to announce the release of 2.4.0
> > >
> > >
> > > 
> > > From: Wenchen Fan 
> > > Sent: Tuesday, November 6, 2018 8:51 AM
> > > To: Felix Cheung
> > > Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
> > > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> > >
> > > Do you mean we should have a 2.4.0 release without CRAN and then do a 
> > > 2.4.1 immediately?
> > >
> > > On Wed, Nov 7, 2018 at 12:34 AM Felix Cheung  
> > > wrote:
> > >>
> > >> Shivaram and I were discussing.
> > >> Actually we worked with them before. Another possible approach is to 
> > >> remove the vignettes eval and all test from the source package... in the 
> > >> next release.
> > >>
> > >>
> > >> 
> > >> From: Matei Zaharia 
> > >> Sent: Tuesday, November 6, 2018 12:07 AM
> > >> To: Felix Cheung
> > >> Cc: Sean Owen; dev; Shivaram Venkataraman
> > >> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> > >>
> > >> Maybe it’s wroth contacting the CRAN maintainers to ask for help? 
> > >> Perhaps we aren’t disabling it correctly, or perhaps they can ignore 
> > >> this specific failure. +Shivaram who might have some ideas.
> > >>
> > >> Matei
> > >>
> > >> > On Nov 5, 2018, at 9:09 PM, Felix Cheung  
> > >> > wrote:
> > >> >
> > >> > I don¡Št know what the cause is yet.
> > >> >
> > >> > The test should be skipped because of this check
> > >> > https://github.com/apache/spark/blob

[jira] [Created] (SPARK-26010) SparkR vignette fails on Java 11

2018-11-11 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-26010:


 Summary: SparkR vignette fails on Java 11
 Key: SPARK-26010
 URL: https://issues.apache.org/jira/browse/SPARK-26010
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.4.0, 3.0.0
Reporter: Felix Cheung


follow up to SPARK-25572

but for vignettes

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-11 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Summary: SparkR vignette fails on CRAN on Java 11  (was: SparkR vignette 
fails on Java 11)

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25995) sparkR should ensure user args are after the argument used for the port

2018-11-10 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682762#comment-16682762
 ] 

Felix Cheung commented on SPARK-25995:
--

sparkR is just taking the whole string as-is

[https://github.com/apache/spark/blob/141953f4c44dbad1c2a7059e92bec5fe770af932/R/pkg/R/client.R#L59]

you see sparkSubmitOpts is before args (args is the file with port number)

I think we should avoid duplicating the submit arg parsing in R, which we would 
need to break before 
{code:java}
fooarg
{code}
?

Is it easier/better to always set the temp file with port as the last arg 
instead?

 

 

> sparkR should ensure user args are after the argument used for the port
> ---
>
> Key: SPARK-25995
> URL: https://issues.apache.org/jira/browse/SPARK-25995
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.2
>Reporter: Thomas Graves
>Priority: Minor
>
> Currently if you run sparkR and accidentally specify an argument, it fails 
> with a useless error message.  For example:
> $SPARK_HOME/bin/sparkR  --master yarn --deploy-mode client fooarg
> This gets turned into:
> Launching java with spark-submit command spark-submit   "--master" "yarn" 
> "--deploy-mode" "client" "sparkr-shell" "fooarg" 
> /tmp/Rtmp6XBGz2/backend_port162806ea36bca
> Notice that "fooarg" got put before /tmp file which is how R and jvm know 
> which port to connect to.  SparkR eventually fails with timeout exception 
> after 10 seconds.  
>  
> SparkR should either not allow args or make sure the order is correct so the 
> backend_port is always first. see 
> https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L129



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-10 Thread Felix Cheung
It’s a great point about min R version. From what I see, mostly because of 
fixes and packages support, most users of R are fairly up to date? So perhaps 
3.4 as min version is reasonable esp. for Spark 3.

Are we getting traction with CRAN sysadmin? It seems like this has been broken 
a few times.



From: Liang-Chi Hsieh 
Sent: Saturday, November 10, 2018 2:32 AM
To: dev@spark.apache.org
Subject: Re: [discuss] SparkR CRAN feasibility check server problem


Yeah, thanks Hyukjin Kwon for bringing this up for discussion.

I don't know how higher versions of R are widely used across R community. If
R version 3.1.x was not very commonly used, I think we can discuss to
upgrade minimum R version in next Spark version.

If we ended up with not upgrading, we can discuss with CRAN sysadmin to fix
it by the service side automatically that prevents malformed R packages
info. So we don't need to fix it manually every time.



Hyukjin Kwon wrote
>> Can upgrading R able to fix the issue. Is this perhaps not necessarily
> malform but some new format for new versions perhaps?
> That's my guess. I am not totally sure about it tho.
>
>> Anyway we should consider upgrading R version if that fixes the problem.
> Yea, we should. If we should, it should be more them R 3.4. Maybe it's
> good
> time to start to talk about minimum R version. 3.1.x is too old. It's
> released 4.5 years ago.
> R 3.4.0 is released 1.5 years ago. Considering the timing for Spark 3.0,
> deprecating lower versions, bumping up R to 3.4 might be reasonable
> option.
>
> Adding Shane as well.
>
> If we ended up with not upgrading it, I will forward this email to CRAN
> sysadmin to discuss further anyway.
>
>
>
> 2018년 11월 2일 (금) 오후 12:51, Felix Cheung <

> felixcheung@

> >님이 작성:
>
>> Thanks for being this up and much appreciate with keeping on top of this
>> at all times.
>>
>> Can upgrading R able to fix the issue. Is this perhaps not necessarily
>> malform but some new format for new versions perhaps? Anyway we should
>> consider upgrading R version if that fixes the problem.
>>
>> As an option we could also disable the repo check in Jenkins but I can
>> see
>> that could also be problematic.
>>
>>
>> On Thu, Nov 1, 2018 at 7:35 PM Hyukjin Kwon <

> gurwls223@

> > wrote:
>>
>>> Hi all,
>>>
>>> I want to raise the CRAN failure issue because it started to block Spark
>>> PRs time to time. Since the number
>>> of PRs grows hugely in Spark community, this is critical to not block
>>> other PRs.
>>>
>>> There has been a problem at CRAN (See
>>> https://github.com/apache/spark/pull/20005 for analysis).
>>> To cut it short, the root cause is malformed package info from
>>> https://cran.r-project.org/src/contrib/PACKAGES
>>> from server side, and this had to be fixed by requesting it to CRAN
>>> sysaadmin's help.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-24152 <- newly open. I am
>>> pretty sure it's the same issue
>>> https://issues.apache.org/jira/browse/SPARK-25923 <- reopen/resolved 2
>>> times
>>> https://issues.apache.org/jira/browse/SPARK-22812
>>>
>>> This happened 5 times for roughly about 10 months, causing blocking
>>> almost all PRs in Apache Spark.
>>> Historically, it blocked whole PRs for few days once, and whole Spark
>>> community had to stop working.
>>>
>>> I assume this has been not a super big big issue so far for other
>>> projects or other people because apparently
>>> higher version of R has some logics to handle this malformed documents
>>> (at least I verified R 3.4.0 works fine).
>>>
>>> For our side, Jenkins has low R version (R 3.1.1 if that's not updated
>>> from what I have seen before),
>>> which is unable to parse the malformed server's response.
>>>
>>> So, I want to talk about how we are going to handle this. Possible
>>> solutions are:
>>>
>>> 1. We should start a talk with CRAN sysadmin to permanently prevent this
>>> issue
>>> 2. We upgrade R to 3.4.0 in Jenkins (however we will not be able to test
>>> low R versions)
>>> 3. ...
>>>
>>> If if we fine, I would like to suggest to forward this email to CRAN
>>> sysadmin to discuss further about this.
>>>
>>> Adding Liang-Chi Felix and Shivaram who I already talked about this few
>>> times before.
>>>
>>> Thanks all.
>>>
>>>
>>>
>>>





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: DataSourceV2 capability API

2018-11-09 Thread Felix Cheung
One question is where will the list of capability strings be defined?



From: Ryan Blue 
Sent: Thursday, November 8, 2018 2:09 PM
To: Reynold Xin
Cc: Spark Dev List
Subject: Re: DataSourceV2 capability API


Yes, we currently use traits that have methods. Something like “supports 
reading missing columns” doesn’t need to deliver methods. The other example is 
where we don’t have an object to test for a trait 
(scan.isInstanceOf[SupportsBatch]) until we have a Scan with pushdown done. 
That could be expensive so we can use a capability to fail faster.

On Thu, Nov 8, 2018 at 1:54 PM Reynold Xin 
mailto:r...@databricks.com>> wrote:
This is currently accomplished by having traits that data sources can extend, 
as well as runtime exceptions right? It's hard to argue one way vs another 
without knowing how things will evolve (e.g. how many different capabilities 
there will be).


On Thu, Nov 8, 2018 at 12:50 PM Ryan Blue  wrote:

Hi everyone,

I’d like to propose an addition to DataSourceV2 tables, a capability API. This 
API would allow Spark to query a table to determine whether it supports a 
capability or not:

val table = catalog.load(identifier)
val supportsContinuous = table.isSupported("continuous-streaming")


There are a couple of use cases for this. First, we want to be able to fail 
fast when a user tries to stream a table that doesn’t support it. The design of 
our read implementation doesn’t necessarily support this. If we want to share 
the same “scan” across streaming and batch, then we need to “branch” in the API 
after that point, but that is at odds with failing fast. We could use 
capabilities to fail fast and not worry about that concern in the read design.

I also want to use capabilities to change the behavior of some validation 
rules. The rule that validates appends, for example, doesn’t allow a write that 
is missing an optional column. That’s because the current v1 sources don’t 
support reading when columns are missing. But Iceberg does support reading a 
missing column as nulls, so that users can add a column to a table without 
breaking a scheduled job that populates the table. To fix this problem, I would 
use a table capability, like read-missing-columns-as-null.

Any comments on this approach?

rb

--
Ryan Blue
Software Engineer
Netflix


--
Ryan Blue
Software Engineer
Netflix


Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-09 Thread Felix Cheung
Very cool!



From: Hyukjin Kwon 
Sent: Thursday, November 8, 2018 10:29 AM
To: dev
Subject: Arrow optimization in conversion from R DataFrame to Spark DataFrame

Hi all,

I am trying to introduce R Arrow optimization by reusing PySpark Arrow 
optimization.

It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster.

Looks working fine so far; however, I would appreciate if you guys have some 
time to take a look (https://github.com/apache/spark/pull/22954) so that we can 
directly go ahead as soon as R API of Arrow is released.

More importantly, I want some more people who're more into Arrow R API side but 
also interested in Spark side. I have already cc'ed some people I know but 
please come, review and discuss for both Spark side and Arrow side.

Thanks.



Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-08 Thread Felix Cheung
They were discussed on dev@ in Mar 2018, for example.

Several attempts were made in 2.3.0, 2.3.1, 2.3.2, 2.4.0.
It’s not just tests, the last one is with vignettes.

The current doc about RStudio actually assumes you have the full Spark 
distribution (ie from the download page and Apache Mirror) and set SPARK_HOME 
etc, which is not a hard way and the doc also says it is the same for R shell, 
R script or other R IDE, with the exact same steps.




From: Matei Zaharia 
Sent: Wednesday, November 7, 2018 10:32 PM
To: Wenchen Fan
Cc: Shivaram Venkataraman; Felix Cheung; Sean Owen; Spark dev list
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

I didn’t realize the same thing was broken in 2.3.0, but we should probably 
have made this a blocker for future releases, if it’s just a matter of removing 
things from the test script. We should also make the docs at 
https://spark.apache.org/docs/latest/sparkr.html clear about how we want people 
to run SparkR. They don’t seem to say to use any specific mirror or anything 
(in fact they only talk about how to import SparkR in RStudio and in our 
bin/sparkR, not in a normal R shell). I’m pretty sure it’s OK to update the 
docs website for 2.4.0 after the release to fix this if we want.

Matei

> On Nov 7, 2018, at 6:24 PM, Wenchen Fan  wrote:
>
> Do we need to create a JIRA ticket for it and list it as a known issue in 
> 2.4.0 release notes?
>
> On Wed, Nov 7, 2018 at 11:26 PM Shivaram Venkataraman 
>  wrote:
> Agree with the points Felix made.
>
> One thing is that it looks like the only problem is vignettes and the
> tests are being skipped as designed. If you see
> https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Windows/00check.log
> and 
> https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Debian/00check.log,
> the tests run in 1s.
> On Tue, Nov 6, 2018 at 1:29 PM Felix Cheung  wrote:
> >
> > I’d rather not mess with 2.4.0 at this point. On CRAN is nice but users can 
> > also install from Apache Mirror.
> >
> > Also I had attempted and failed to get vignettes not to build, it was non 
> > trivial and could t get it to work. It I have an idea.
> >
> > As for tests I don’t know exact why is it not skipped. Need to investigate 
> > but worse case test_package can run with 0 test.
> >
> >
> >
> > ________
> > From: Sean Owen 
> > Sent: Tuesday, November 6, 2018 10:51 AM
> > To: Shivaram Venkataraman
> > Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev
> > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> >
> > I think the second option, to skip the tests, is best right now, if
> > the alternative is to have no SparkR release at all!
> > Can we monkey-patch the 2.4.0 release for SparkR in this way, bless it
> > from the PMC, and release that? It's drastic but so is not being able
> > to release, I think.
> > Right? or is CRAN not actually an important distribution path for
> > SparkR in particular?
> >
> > On Tue, Nov 6, 2018 at 12:49 PM Shivaram Venkataraman
> >  wrote:
> > >
> > > Right - I think we should move on with 2.4.0.
> > >
> > > In terms of what can be done to avoid this error there are two strategies
> > > - Felix had this other thread about JDK 11 that should at least let
> > > Spark run on the CRAN instance. In general this strategy isn't
> > > foolproof because the JDK version and other dependencies on that
> > > machine keep changing over time and we dont have much control over it.
> > > Worse we also dont have much control
> > > - The other solution is to not run code to build the vignettes
> > > document and just have static code blocks there that have been
> > > pre-evaluated / pre-populated. We can open a JIRA to discuss the
> > > pros/cons of this ?
> > >
> > > Thanks
> > > Shivaram
> > >
> > > On Tue, Nov 6, 2018 at 10:57 AM Felix Cheung  
> > > wrote:
> > > >
> > > > We have not been able to publish to CRAN for quite some time (since 
> > > > 2.3.0 was archived - the cause is Java 11)
> > > >
> > > > I think it’s ok to announce the release of 2.4.0
> > > >
> > > >
> > > > 
> > > > From: Wenchen Fan 
> > > > Sent: Tuesday, November 6, 2018 8:51 AM
> > > > To: Felix Cheung
> > > > Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
> > > > Subject: Re: [CRAN-pretest-archived] CRAN submission Spark

Re: [Vote] call a vote for IoTDB incubation proposal

2018-11-07 Thread Felix Cheung
+1 cool project


On Wed, Nov 7, 2018 at 1:02 AM Gosling Von  wrote:

> +1
>
> Good luck ~
>
> Von Gosling
>
>
> > 在 2018年11月7日,下午3:46,hxd  写道:
> >
> > Hi,
> > Sorry for the previous mail with bad format.
> > I'd like to call a VOTE to accept IoTDB project, a database for managing
> large amounts of time series data  from IoT sensors in industrial
> applications, into the Apache Incubator.
> > The full proposal is available on the wiki:
> https://wiki.apache.org/incubator/IoTDBProposal
> > and it is also attached below for your convenience.
> >
> > Please cast your vote:
> >
> >  [ ] +1, bring IoTDB into Incubator
> >  [ ] +0, I don't care either way,
> >  [ ] -1, do not bring IoTDB into Incubator, because...
> >
> > The vote will open at least for 72 hours.
> >
> > Thanks,
> > Xiangdong Huang.
> >
> > = IoTDB Proposal  =
> > v0.1.1
> >
> >
> > == Abstract ==
> > IoTDB is a data store for managing large amounts of time series data
> such as timestamped data from IoT sensors in industrial applications.
> >
> > == Proposal ==
> > IoTDB is a database for managing large amount of time series data with
> columnar storage, data encoding, pre-computation, and index techniques. It
> has SQL-like interface to write millions of data points per second per node
> and is optimized to get query results in few seconds over trillions of data
> points. It can also be easily integrated with Apache Hadoop MapReduce and
> Apache Spark for analytics.
> >
> > == Background ==
> >
> > A new class of data management system requirements is becoming
> increasingly important with the rise of the Internet of Things. There are
> some database systems and technologies aimed at time series data
> management.  For example, Gorilla and InfluxDB which are mainly built for
> data centers and monitoring application metrics. Other systems, for
> example, OpenTSDB and KairosDB, are built on Apache HBase and Apache
> Cassandra, respectively.
> >
> > However, many applications for time series data management have more
> requirements especially in industrial applications as follows:
> >
> > * Supporting time series data which has high data frequency. For
> example, a turbine engine may generate 1000 points per second (i.e.,
> 1000Hz), while each CPU only reports 1 data points per 5 seconds in a data
> center monitoring application.
> >
> > * Supporting scanning data multi-resolutionally. For example,
> aggregation operation is important for time series data.
> >
> > * Supporting special queries for time series, such as pattern matching,
> time series segmentation, time-frequency transformation and frequency query.
> >
> > * Supporting a large number of monitoring targets (i.e. time series). An
> excavator may report more than 1000 time series, for example, revolving
> speed of the motor-engine, the speed of the excavator, the accelerated
> speed, the temperature of the water tank and so on, while a CPU or an
> application monitor has much fewer time series.
> >
> > * Optimization for out-of-order data points. In the industrial sector,
> it is common that equipment sends data using the UDP protocol rather than
> the TCP protocol. Sometimes, the network connect is unstable and parts of
> the data will be buffered for later sending.
> >
> > * Supporting long-term storage. Historical data is precious for
> equipment manufacturers. Therefore, removing or unloading historical data
> is highly desired for most industrial applications. The database system
> must not only support fast retrieval of historical data, but also should
> guarantee that the historical data does not impact the processing speed for
> “hot” or current data.
> >
> > * Supporting online transaction processing (OLTP) as well as complex
> analytics. It is obvious that supporting analyzing from the data files
> using Apache Spark/Apache Hadoop MapReduce directly is better than
> transforming data files to another file format for Big Data analytics.
> >
> > * Flexible deployment either on premise or in the cloud.  IoTDB is as
> simple and can be deployed on a Raspberry Pi handling hundreds of time
> series. Meanwhile, the system can be also deployed in the cloud so that it
> supports tens of millions ingestions per second, OLTP queries in
> milliseconds, and analytics using Apache Spark/Apache Hadoop MapReduce.
> >
> > * * (1) If users deploy IoTDB on a device, such as a Raspberry Pi, a
> wind turbine, or a meteorological station, the deployment of the chosen
> database is designed to be simple. A device may have hundreds of time
> series (but less than a thousand time series) and the database needs to
> handle them.
> > * * (2) When deploying IoTDB in a data center, the computational
> resources (i.e., the hardware configuration of servers) is not a problem
> when compared to a Raspberry Pi. In this deployment, IoTDB can use more
> computation resources, and has the ability to handle more time seires
> (e.g., millions of time series).
> >
> > Based on these requirements, we developed Io

Re: Test and support only LTS JDK release?

2018-11-06 Thread Felix Cheung
Is there a list of LTS release that I can reference?



From: Ryan Blue 
Sent: Tuesday, November 6, 2018 1:28 PM
To: sn...@snazy.de
Cc: Spark Dev List; cdelg...@apple.com
Subject: Re: Test and support only LTS JDK release?

+1 for supporting LTS releases.

On Tue, Nov 6, 2018 at 11:48 AM Robert Stupp 
mailto:sn...@snazy.de>> wrote:

+1 on supporting LTS releases.

VM distributors (RedHat, Azul - to name two) want to provide patches to LTS 
versions (i.e. into http://hg.openjdk.java.net/jdk-updates/jdk11u/). How that 
will play out in reality ... I don't know. Whether Oracle will contribute to 
that repo for 8 after it's EOL and 11 after the 6 month cycle ... we will see. 
Most Linux distributions promised(?) long-term support for Java 11 in their LTS 
releases (e.g. Ubuntu 18.04). I am not sure what that exactly means ... whether 
they will actively provide patches to OpenJDK or whether they just build from 
source.

But considering that, I think it's definitely worth to at least keep an eye on 
Java 12 and 13 - even if those are just EA. Java 12 for example does already 
forbid some "dirty tricks" that are still possible in Java 11.


On 11/6/18 8:32 PM, DB Tsai wrote:
OpenJDK will follow Oracle's release cycle, 
https://openjdk.java.net/projects/jdk/, a strict six months model. I'm not 
familiar with other non-Oracle VMs and Redhat support.

DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, Inc

On Nov 6, 2018, at 11:26 AM, Reynold Xin 
mailto:r...@databricks.com>> wrote:

What does OpenJDK do and other non-Oracle VMs? I know there was a lot of 
discussions from Redhat etc to support.


On Tue, Nov 6, 2018 at 11:24 AM DB Tsai 
mailto:d_t...@apple.com>> wrote:
Given Oracle's new 6-month release model, I feel the only realistic option is 
to only test and support JDK such as JDK 11 LTS and future LTS release. I would 
like to have a discussion on this in Spark community.

Thanks,

DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, Inc



--
Robert Stupp
@snazy


--
Ryan Blue
Software Engineer
Netflix


Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread Felix Cheung
So to clarify, only scala 2.12 is supported in Spark 3?



From: Ryan Blue 
Sent: Tuesday, November 6, 2018 1:24 PM
To: d_t...@apple.com
Cc: Sean Owen; Spark Dev List; cdelg...@apple.com
Subject: Re: Make Scala 2.12 as default Scala version in Spark 3.0

+1 to Scala 2.12 as the default in Spark 3.0.

On Tue, Nov 6, 2018 at 11:50 AM DB Tsai 
mailto:d_t...@apple.com>> wrote:
+1 on dropping Scala 2.11 in Spark 3.0 to simplify the build.

As Scala 2.11 will not support Java 11 unless we make a significant investment, 
if we decide not to drop Scala 2.11 in Spark 3.0, what we can do is have only 
Scala 2.12 build support Java 11 while Scala 2.11 support Java 8. But I agree 
with Sean that this can make the decencies really complicated; hence I support 
to drop Scala 2.11 in Spark 3.0 directly.

DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, Inc

On Nov 6, 2018, at 11:38 AM, Sean Owen 
mailto:sro...@gmail.com>> wrote:

I think we should make Scala 2.12 the default in Spark 3.0. I would
also prefer to drop Scala 2.11 support in 3.0. In theory, not dropping
2.11 support it means we'd support Scala 2.11 for years, the lifetime
of Spark 3.x. In practice, we could drop 2.11 support in a 3.1.0 or
3.2.0 release, kind of like what happened with 2.10 in 2.x.

Java (9-)11 support also complicates this. I think getting it to work
will need some significant dependency updates, and I worry not all
will be available for 2.11 or will present some knotty problems. We'll
find out soon if that forces the issue.

Also note that Scala 2.13 is pretty close to release, and we'll want
to support it soon after release, perhaps sooner than the long delay
before 2.12 was supported (because it was hard!). It will probably be
out well before Spark 3.0. Cross-compiling for 3 Scala versions sounds
like too much. 3.0 could support 2.11 and 2.12, and 3.1 support 2.12
and 2.13, or something. But if 2.13 support is otherwise attainable at
the release of Spark 3.0, I wonder if that too argues for dropping
2.11 support.

Finally I'll say that Spark itself isn't dropping 2.11 support for a
while, no matter what; it still exists in the 2.4.x branch of course.
People who can't update off Scala 2.11 can stay on Spark 2.x, note.

Sean


On Tue, Nov 6, 2018 at 1:13 PM DB Tsai 
mailto:d_t...@apple.com>> wrote:

We made Scala 2.11 as default Scala version in Spark 2.0. Now, the next Spark 
version will be 3.0, so it's a great time to discuss should we make Scala 2.12 
as default Scala version in Spark 3.0.

Scala 2.11 is EOL, and it came out 4.5 ago; as a result, it's unlikely to 
support JDK 11 in Scala 2.11 unless we're willing to sponsor the needed work 
per discussion in Scala community, 
https://github.com/scala/scala-dev/issues/559#issuecomment-436160166

We have initial support of Scala 2.12 in Spark 2.4. If we decide to make Scala 
2.12 as default for Spark 3.0 now, we will have ample time to work on bugs and 
issues that we may run into.

What do you think?

Thanks,

DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, Inc


-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org




--
Ryan Blue
Software Engineer
Netflix


Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Felix Cheung
I’d rather not mess with 2.4.0 at this point. On CRAN is nice but users can 
also install from Apache Mirror.

Also I had attempted and failed to get vignettes not to build, it was non 
trivial and could t get it to work.  It I have an idea.

As for tests I don’t know exact why is it not skipped. Need to investigate but 
worse case test_package can run with 0 test.




From: Sean Owen 
Sent: Tuesday, November 6, 2018 10:51 AM
To: Shivaram Venkataraman
Cc: Felix Cheung; Wenchen Fan; Matei Zaharia; dev
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

I think the second option, to skip the tests, is best right now, if
the alternative is to have no SparkR release at all!
Can we monkey-patch the 2.4.0 release for SparkR in this way, bless it
from the PMC, and release that? It's drastic but so is not being able
to release, I think.
Right? or is CRAN not actually an important distribution path for
SparkR in particular?

On Tue, Nov 6, 2018 at 12:49 PM Shivaram Venkataraman
 wrote:
>
> Right - I think we should move on with 2.4.0.
>
> In terms of what can be done to avoid this error there are two strategies
> - Felix had this other thread about JDK 11 that should at least let
> Spark run on the CRAN instance. In general this strategy isn't
> foolproof because the JDK version and other dependencies on that
> machine keep changing over time and we dont have much control over it.
> Worse we also dont have much control
> - The other solution is to not run code to build the vignettes
> document and just have static code blocks there that have been
> pre-evaluated / pre-populated. We can open a JIRA to discuss the
> pros/cons of this ?
>
> Thanks
> Shivaram
>
> On Tue, Nov 6, 2018 at 10:57 AM Felix Cheung  
> wrote:
> >
> > We have not been able to publish to CRAN for quite some time (since 2.3.0 
> > was archived - the cause is Java 11)
> >
> > I think it’s ok to announce the release of 2.4.0
> >
> >
> > ____
> > From: Wenchen Fan 
> > Sent: Tuesday, November 6, 2018 8:51 AM
> > To: Felix Cheung
> > Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
> > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> >
> > Do you mean we should have a 2.4.0 release without CRAN and then do a 2.4.1 
> > immediately?
> >
> > On Wed, Nov 7, 2018 at 12:34 AM Felix Cheung  
> > wrote:
> >>
> >> Shivaram and I were discussing.
> >> Actually we worked with them before. Another possible approach is to 
> >> remove the vignettes eval and all test from the source package... in the 
> >> next release.
> >>
> >>
> >> 
> >> From: Matei Zaharia 
> >> Sent: Tuesday, November 6, 2018 12:07 AM
> >> To: Felix Cheung
> >> Cc: Sean Owen; dev; Shivaram Venkataraman
> >> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> >>
> >> Maybe it’s wroth contacting the CRAN maintainers to ask for help? Perhaps 
> >> we aren’t disabling it correctly, or perhaps they can ignore this specific 
> >> failure. +Shivaram who might have some ideas.
> >>
> >> Matei
> >>
> >> > On Nov 5, 2018, at 9:09 PM, Felix Cheung  
> >> > wrote:
> >> >
> >> > I don¡Št know what the cause is yet.
> >> >
> >> > The test should be skipped because of this check
> >> > https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L21
> >> >
> >> > And this
> >> > https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L57
> >> >
> >> > But it ran:
> >> > callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
> >> > "fit", formula,
> >> >
> >> > The earlier release was achived because of Java 11+ too so this 
> >> > unfortunately isn¡Št new.
> >> >
> >> >
> >> > From: Sean Owen 
> >> > Sent: Monday, November 5, 2018 7:22 PM
> >> > To: Felix Cheung
> >> > Cc: dev
> >> > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> >> >
> >> > What can we do to get the release through? is there any way to
> >> > circumvent these tests or otherwise hack it? or does it need a
> >> > maintenance release?
> >> > On Mon, Nov 5, 2018 at 8:53 PM Felix Cheung  
> >> > wrote:
> >> > >
> >> > > FYI. SparkR

Re: Java 11 support

2018-11-06 Thread Felix Cheung
+1 for Spark 3, definitely
Thanks for the updates



From: Sean Owen 
Sent: Tuesday, November 6, 2018 9:11 AM
To: Felix Cheung
Cc: dev
Subject: Re: Java 11 support

I think that Java 9 support basically gets Java 10, 11 support. But
the jump from 8 to 9 is unfortunately more breaking than usual because
of the total revamping of the internal JDK classes. I think it will be
mostly a matter of dependencies needing updates to work. I agree this
is probably pretty important for Spark 3. Here's the ticket I know of:
https://issues.apache.org/jira/browse/SPARK-24417 . DB is already
working on some of it, I see.
On Tue, Nov 6, 2018 at 10:59 AM Felix Cheung  wrote:
>
> Speaking of, get we work to support Java 11?
> That will fix all the problems below.
>
>
>
> ________
> From: Felix Cheung 
> Sent: Tuesday, November 6, 2018 8:57 AM
> To: Wenchen Fan
> Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> We have not been able to publish to CRAN for quite some time (since 2.3.0 was 
> archived - the cause is Java 11)
>
> I think it’s ok to announce the release of 2.4.0
>
>
> 
> From: Wenchen Fan 
> Sent: Tuesday, November 6, 2018 8:51 AM
> To: Felix Cheung
> Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> Do you mean we should have a 2.4.0 release without CRAN and then do a 2.4.1 
> immediately?
>
> On Wed, Nov 7, 2018 at 12:34 AM Felix Cheung  
> wrote:
>>
>> Shivaram and I were discussing.
>> Actually we worked with them before. Another possible approach is to remove 
>> the vignettes eval and all test from the source package... in the next 
>> release.
>>
>>
>> 
>> From: Matei Zaharia 
>> Sent: Tuesday, November 6, 2018 12:07 AM
>> To: Felix Cheung
>> Cc: Sean Owen; dev; Shivaram Venkataraman
>> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>>
>> Maybe it’s wroth contacting the CRAN maintainers to ask for help? Perhaps we 
>> aren’t disabling it correctly, or perhaps they can ignore this specific 
>> failure. +Shivaram who might have some ideas.
>>
>> Matei
>>
>> > On Nov 5, 2018, at 9:09 PM, Felix Cheung  wrote:
>> >
>> > I don¡Št know what the cause is yet.
>> >
>> > The test should be skipped because of this check
>> > https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L21
>> >
>> > And this
>> > https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L57
>> >
>> > But it ran:
>> > callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
>> > "fit", formula,
>> >
>> > The earlier release was achived because of Java 11+ too so this 
>> > unfortunately isn¡Št new.
>> >
>> >
>> > From: Sean Owen 
>> > Sent: Monday, November 5, 2018 7:22 PM
>> > To: Felix Cheung
>> > Cc: dev
>> > Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>> >
>> > What can we do to get the release through? is there any way to
>> > circumvent these tests or otherwise hack it? or does it need a
>> > maintenance release?
>> > On Mon, Nov 5, 2018 at 8:53 PM Felix Cheung  
>> > wrote:
>> > >
>> > > FYI. SparkR submission failed. It seems to detect Java 11 correctly with 
>> > > vignettes but not skipping tests as would be expected.
>> > >
>> > > Error: processing vignette ¡¥sparkr-vignettes.Rmd¡Š failed with 
>> > > diagnostics:
>> > > Java version 8 is required for this package; found version: 11.0.1
>> > > Execution halted
>> > >
>> > > * checking PDF version of manual ... OK
>> > > * DONE
>> > > Status: 1 WARNING, 1 NOTE
>> > >
>> > > Current CRAN status: ERROR: 1, OK: 1
>> > > See: <https://CRAN.R-project.org/web/checks/check_results_SparkR.html>
>> > >
>> > > Version: 2.3.0
>> > > Check: tests, Result: ERROR
>> > > Running ¡¥run-all.R¡Š [8s/35s]
>> > > Running the tests in ¡¥tests/run-all.R¡Š failed.
>> > > Last 13 lines of output:
>> > > 4: 
>> > > callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
>> > > &

Java 11 support

2018-11-06 Thread Felix Cheung
Speaking of, get we work to support Java 11?
That will fix all the problems below.




From: Felix Cheung 
Sent: Tuesday, November 6, 2018 8:57 AM
To: Wenchen Fan
Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

We have not been able to publish to CRAN for quite some time (since 2.3.0 was 
archived - the cause is Java 11)

I think it’s ok to announce the release of 2.4.0



From: Wenchen Fan 
Sent: Tuesday, November 6, 2018 8:51 AM
To: Felix Cheung
Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

Do you mean we should have a 2.4.0 release without CRAN and then do a 2.4.1 
immediately?

On Wed, Nov 7, 2018 at 12:34 AM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
Shivaram and I were discussing.
Actually we worked with them before. Another possible approach is to remove the 
vignettes eval and all test from the source package... in the next release.



From: Matei Zaharia mailto:matei.zaha...@gmail.com>>
Sent: Tuesday, November 6, 2018 12:07 AM
To: Felix Cheung
Cc: Sean Owen; dev; Shivaram Venkataraman
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

Maybe it’s wroth contacting the CRAN maintainers to ask for help? Perhaps we 
aren’t disabling it correctly, or perhaps they can ignore this specific 
failure. +Shivaram who might have some ideas.

Matei

> On Nov 5, 2018, at 9:09 PM, Felix Cheung 
> mailto:felixcheun...@hotmail.com>> wrote:
>
> I don¡Št know what the cause is yet.
>
> The test should be skipped because of this check
> https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L21
>
> And this
> https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L57
>
> But it ran:
> callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
> "fit", formula,
>
> The earlier release was achived because of Java 11+ too so this unfortunately 
> isn¡Št new.
>
>
> From: Sean Owen mailto:sro...@gmail.com>>
> Sent: Monday, November 5, 2018 7:22 PM
> To: Felix Cheung
> Cc: dev
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> What can we do to get the release through? is there any way to
> circumvent these tests or otherwise hack it? or does it need a
> maintenance release?
> On Mon, Nov 5, 2018 at 8:53 PM Felix Cheung 
> mailto:felixcheun...@hotmail.com>> wrote:
> >
> > FYI. SparkR submission failed. It seems to detect Java 11 correctly with 
> > vignettes but not skipping tests as would be expected.
> >
> > Error: processing vignette ¡¥sparkr-vignettes.Rmd¡Š failed with diagnostics:
> > Java version 8 is required for this package; found version: 11.0.1
> > Execution halted
> >
> > * checking PDF version of manual ... OK
> > * DONE
> > Status: 1 WARNING, 1 NOTE
> >
> > Current CRAN status: ERROR: 1, OK: 1
> > See: <https://CRAN.R-project.org/web/checks/check_results_SparkR.html>
> >
> > Version: 2.3.0
> > Check: tests, Result: ERROR
> > Running ¡¥run-all.R¡Š [8s/35s]
> > Running the tests in ¡¥tests/run-all.R¡Š failed.
> > Last 13 lines of output:
> > 4: callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
> > "fit", formula,
> > data@sdf, tolower(family$family), family$link, tol, as.integer(maxIter), 
> > weightCol,
> > regParam, as.double(var.power), as.double(link.power), 
> > stringIndexerOrderType,
> > offsetCol)
> > 5: invokeJava(isStatic = TRUE, className, methodName, ...)
> > 6: handleErrors(returnStatus, conn)
> > 7: stop(readString(conn))
> >
> >  testthat results 
> > ùù
> > OK: 0 SKIPPED: 0 FAILED: 2
> > 1. Error: create DataFrame from list or data.frame (@test_basic.R#26)
> > 2. Error: spark.glm and predict (@test_basic.R#58)
> >
> >
> >
> > -- Forwarded message -
> > Date: Mon, Nov 5, 2018, 10:12
> > Subject: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> >
> > Dear maintainer,
> >
> > package SparkR_2.4.0.tar.gz does not pass the incoming checks 
> > automatically, please see the following pre-tests:
> > Windows: 
> > <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Windows/00check.log>
> > Status: 1 NOTE
> > Debian: 
> > <https://win-builder.r

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Felix Cheung
We have not been able to publish to CRAN for quite some time (since 2.3.0 was 
archived - the cause is Java 11)

I think it’s ok to announce the release of 2.4.0



From: Wenchen Fan 
Sent: Tuesday, November 6, 2018 8:51 AM
To: Felix Cheung
Cc: Matei Zaharia; Sean Owen; Spark dev list; Shivaram Venkataraman
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

Do you mean we should have a 2.4.0 release without CRAN and then do a 2.4.1 
immediately?

On Wed, Nov 7, 2018 at 12:34 AM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
Shivaram and I were discussing.
Actually we worked with them before. Another possible approach is to remove the 
vignettes eval and all test from the source package... in the next release.



From: Matei Zaharia mailto:matei.zaha...@gmail.com>>
Sent: Tuesday, November 6, 2018 12:07 AM
To: Felix Cheung
Cc: Sean Owen; dev; Shivaram Venkataraman
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

Maybe it’s wroth contacting the CRAN maintainers to ask for help? Perhaps we 
aren’t disabling it correctly, or perhaps they can ignore this specific 
failure. +Shivaram who might have some ideas.

Matei

> On Nov 5, 2018, at 9:09 PM, Felix Cheung 
> mailto:felixcheun...@hotmail.com>> wrote:
>
> I don¡Št know what the cause is yet.
>
> The test should be skipped because of this check
> https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L21
>
> And this
> https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L57
>
> But it ran:
> callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
> "fit", formula,
>
> The earlier release was achived because of Java 11+ too so this unfortunately 
> isn¡Št new.
>
>
> From: Sean Owen mailto:sro...@gmail.com>>
> Sent: Monday, November 5, 2018 7:22 PM
> To: Felix Cheung
> Cc: dev
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> What can we do to get the release through? is there any way to
> circumvent these tests or otherwise hack it? or does it need a
> maintenance release?
> On Mon, Nov 5, 2018 at 8:53 PM Felix Cheung 
> mailto:felixcheun...@hotmail.com>> wrote:
> >
> > FYI. SparkR submission failed. It seems to detect Java 11 correctly with 
> > vignettes but not skipping tests as would be expected.
> >
> > Error: processing vignette ¡¥sparkr-vignettes.Rmd¡Š failed with diagnostics:
> > Java version 8 is required for this package; found version: 11.0.1
> > Execution halted
> >
> > * checking PDF version of manual ... OK
> > * DONE
> > Status: 1 WARNING, 1 NOTE
> >
> > Current CRAN status: ERROR: 1, OK: 1
> > See: <https://CRAN.R-project.org/web/checks/check_results_SparkR.html>
> >
> > Version: 2.3.0
> > Check: tests, Result: ERROR
> > Running ¡¥run-all.R¡Š [8s/35s]
> > Running the tests in ¡¥tests/run-all.R¡Š failed.
> > Last 13 lines of output:
> > 4: callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
> > "fit", formula,
> > data@sdf, tolower(family$family), family$link, tol, as.integer(maxIter), 
> > weightCol,
> > regParam, as.double(var.power), as.double(link.power), 
> > stringIndexerOrderType,
> > offsetCol)
> > 5: invokeJava(isStatic = TRUE, className, methodName, ...)
> > 6: handleErrors(returnStatus, conn)
> > 7: stop(readString(conn))
> >
> >  testthat results 
> > ùù
> > OK: 0 SKIPPED: 0 FAILED: 2
> > 1. Error: create DataFrame from list or data.frame (@test_basic.R#26)
> > 2. Error: spark.glm and predict (@test_basic.R#58)
> >
> >
> >
> > -- Forwarded message -
> > Date: Mon, Nov 5, 2018, 10:12
> > Subject: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> >
> > Dear maintainer,
> >
> > package SparkR_2.4.0.tar.gz does not pass the incoming checks 
> > automatically, please see the following pre-tests:
> > Windows: 
> > <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Windows/00check.log>
> > Status: 1 NOTE
> > Debian: 
> > <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Debian/00check.log>
> > Status: 1 WARNING, 1 NOTE
> >
> > Last released version's CRAN status: ERROR: 1, OK: 1
> > See: <https://CRAN.R-project.org/web/checks/check_results_SparkR.html>
> >
> > CRAN Web: <https:/

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-06 Thread Felix Cheung
Shivaram and I were discussing.
Actually we worked with them before. Another possible approach is to remove the 
vignettes eval and all test from the source package... in the next release.



From: Matei Zaharia 
Sent: Tuesday, November 6, 2018 12:07 AM
To: Felix Cheung
Cc: Sean Owen; dev; Shivaram Venkataraman
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

Maybe it’s wroth contacting the CRAN maintainers to ask for help? Perhaps we 
aren’t disabling it correctly, or perhaps they can ignore this specific 
failure. +Shivaram who might have some ideas.

Matei

> On Nov 5, 2018, at 9:09 PM, Felix Cheung  wrote:
>
> I don¡Št know what the cause is yet.
>
> The test should be skipped because of this check
> https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L21
>
> And this
> https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L57
>
> But it ran:
> callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
> "fit", formula,
>
> The earlier release was achived because of Java 11+ too so this unfortunately 
> isn¡Št new.
>
>
> From: Sean Owen 
> Sent: Monday, November 5, 2018 7:22 PM
> To: Felix Cheung
> Cc: dev
> Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> What can we do to get the release through? is there any way to
> circumvent these tests or otherwise hack it? or does it need a
> maintenance release?
> On Mon, Nov 5, 2018 at 8:53 PM Felix Cheung  wrote:
> >
> > FYI. SparkR submission failed. It seems to detect Java 11 correctly with 
> > vignettes but not skipping tests as would be expected.
> >
> > Error: processing vignette ¡¥sparkr-vignettes.Rmd¡Š failed with diagnostics:
> > Java version 8 is required for this package; found version: 11.0.1
> > Execution halted
> >
> > * checking PDF version of manual ... OK
> > * DONE
> > Status: 1 WARNING, 1 NOTE
> >
> > Current CRAN status: ERROR: 1, OK: 1
> > See: <https://CRAN.R-project.org/web/checks/check_results_SparkR.html>
> >
> > Version: 2.3.0
> > Check: tests, Result: ERROR
> > Running ¡¥run-all.R¡Š [8s/35s]
> > Running the tests in ¡¥tests/run-all.R¡Š failed.
> > Last 13 lines of output:
> > 4: callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
> > "fit", formula,
> > data@sdf, tolower(family$family), family$link, tol, as.integer(maxIter), 
> > weightCol,
> > regParam, as.double(var.power), as.double(link.power), 
> > stringIndexerOrderType,
> > offsetCol)
> > 5: invokeJava(isStatic = TRUE, className, methodName, ...)
> > 6: handleErrors(returnStatus, conn)
> > 7: stop(readString(conn))
> >
> >  testthat results 
> > ùù
> > OK: 0 SKIPPED: 0 FAILED: 2
> > 1. Error: create DataFrame from list or data.frame (@test_basic.R#26)
> > 2. Error: spark.glm and predict (@test_basic.R#58)
> >
> >
> >
> > -- Forwarded message -
> > Date: Mon, Nov 5, 2018, 10:12
> > Subject: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
> >
> > Dear maintainer,
> >
> > package SparkR_2.4.0.tar.gz does not pass the incoming checks 
> > automatically, please see the following pre-tests:
> > Windows: 
> > <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Windows/00check.log>
> > Status: 1 NOTE
> > Debian: 
> > <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Debian/00check.log>
> > Status: 1 WARNING, 1 NOTE
> >
> > Last released version's CRAN status: ERROR: 1, OK: 1
> > See: <https://CRAN.R-project.org/web/checks/check_results_SparkR.html>
> >
> > CRAN Web: <https://cran.r-project.org/package=SparkR>
> >
> > Please fix all problems and resubmit a fixed version via the webform.
> > If you are not sure how to fix the problems shown, please ask for help on 
> > the R-package-devel mailing list:
> > <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
> > If you are fairly certain the rejection is a false positive, please 
> > reply-all to this message and explain.
> >
> > More details are given in the directory:
> > <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/>
> > The files will be removed after roughly 7 days.
> >
> > No strong reverse dependencies to be checked.
> >

Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-05 Thread Felix Cheung
I don’t know what the cause is yet.

The test should be skipped because of this check
https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L21

And this
https://github.com/apache/spark/blob/branch-2.4/R/pkg/inst/tests/testthat/test_basic.R#L57

But it ran:
callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", "fit", 
formula,

The earlier release was achived because of Java 11+ too so this unfortunately 
isn’t new.



From: Sean Owen 
Sent: Monday, November 5, 2018 7:22 PM
To: Felix Cheung
Cc: dev
Subject: Re: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

What can we do to get the release through? is there any way to
circumvent these tests or otherwise hack it? or does it need a
maintenance release?
On Mon, Nov 5, 2018 at 8:53 PM Felix Cheung  wrote:
>
> FYI. SparkR submission failed. It seems to detect Java 11 correctly with 
> vignettes but not skipping tests as would be expected.
>
> Error: processing vignette ‘sparkr-vignettes.Rmd’ failed with diagnostics:
> Java version 8 is required for this package; found version: 11.0.1
> Execution halted
>
> * checking PDF version of manual ... OK
> * DONE
> Status: 1 WARNING, 1 NOTE
>
> Current CRAN status: ERROR: 1, OK: 1
> See: <https://CRAN.R-project.org/web/checks/check_results_SparkR.html>
>
> Version: 2.3.0
> Check: tests, Result: ERROR
> Running ‘run-all.R’ [8s/35s]
> Running the tests in ‘tests/run-all.R’ failed.
> Last 13 lines of output:
> 4: callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
> "fit", formula,
> data@sdf, tolower(family$family), family$link, tol, as.integer(maxIter), 
> weightCol,
> regParam, as.double(var.power), as.double(link.power), stringIndexerOrderType,
> offsetCol)
> 5: invokeJava(isStatic = TRUE, className, methodName, ...)
> 6: handleErrors(returnStatus, conn)
> 7: stop(readString(conn))
>
> ══ testthat results 
> ═══
> OK: 0 SKIPPED: 0 FAILED: 2
> 1. Error: create DataFrame from list or data.frame (@test_basic.R#26)
> 2. Error: spark.glm and predict (@test_basic.R#58)
>
>
>
> -- Forwarded message -
> Date: Mon, Nov 5, 2018, 10:12
> Subject: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0
>
> Dear maintainer,
>
> package SparkR_2.4.0.tar.gz does not pass the incoming checks automatically, 
> please see the following pre-tests:
> Windows: 
> <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Windows/00check.log>
> Status: 1 NOTE
> Debian: 
> <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/Debian/00check.log>
> Status: 1 WARNING, 1 NOTE
>
> Last released version's CRAN status: ERROR: 1, OK: 1
> See: <https://CRAN.R-project.org/web/checks/check_results_SparkR.html>
>
> CRAN Web: <https://cran.r-project.org/package=SparkR>
>
> Please fix all problems and resubmit a fixed version via the webform.
> If you are not sure how to fix the problems shown, please ask for help on the 
> R-package-devel mailing list:
> <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
> If you are fairly certain the rejection is a false positive, please reply-all 
> to this message and explain.
>
> More details are given in the directory:
> <https://win-builder.r-project.org/incoming_pretest/SparkR_2.4.0_20181105_165757/>
> The files will be removed after roughly 7 days.
>
> No strong reverse dependencies to be checked.
>
> Best regards,
> CRAN teams' auto-check service
> Flavor: r-devel-linux-x86_64-debian-gcc, r-devel-windows-ix86+x86_64
> Check: CRAN incoming feasibility, Result: NOTE
> Maintainer: 'Shivaram Venkataraman '
>
> New submission
>
> Package was archived on CRAN
>
> Possibly mis-spelled words in DESCRIPTION:
> Frontend (4:10, 5:28)
>
> CRAN repository db overrides:
> X-CRAN-Comment: Archived on 2018-05-01 as check problems were not
> corrected despite reminders.
>
> Flavor: r-devel-linux-x86_64-debian-gcc
> Check: re-building of vignette outputs, Result: WARNING
> Error in re-building vignettes:
> ...
>
> Attaching package: 'SparkR'
>
> The following objects are masked from 'package:stats':
>
> cov, filter, lag, na.omit, predict, sd, var, window
>
> The following objects are masked from 'package:base':
>
> as.data.frame, colnames, colnames<-, drop, endsWith,
> intersect, rank, rbind, sample, startsWith, subset, summary,
> transform, union
>
> trying URL 
> 'http://mirror.klaus-uwe.me/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz'
> Content type 'application/octet-stream' length 227893062 bytes (217.3 MB)
> ==
> downloaded 217.3 MB
>
> Quitting from lines 65-67 (sparkr-vignettes.Rmd)
> Error: processing vignette 'sparkr-vignettes.Rmd' failed with diagnostics:
> Java version 8 is required for this package; found version: 11.0.1
> Execution halted


Fwd: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

2018-11-05 Thread Felix Cheung
FYI. SparkR submission failed. It seems to detect Java 11 correctly with 
vignettes but not skipping tests as would be expected.

Error: processing vignette ‘sparkr-vignettes.Rmd’ failed with diagnostics:
Java version 8 is required for this package; found version: 11.0.1
Execution halted

* checking PDF version of manual ... OK
* DONE
Status: 1 WARNING, 1 NOTE

Current CRAN status: ERROR: 1, OK: 1
See: 

Version: 2.3.0
Check: tests, Result: ERROR
Running ‘run-all.R’ [8s/35s]
  Running the tests in ‘tests/run-all.R’ failed.
  Last 13 lines of output:
4: callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", 
"fit", formula,
   data@sdf, tolower(family$family), family$link, tol, 
as.integer(maxIter), weightCol,
   regParam, as.double(var.power), as.double(link.power), 
stringIndexerOrderType,
   offsetCol)
5: invokeJava(isStatic = TRUE, className, methodName, ...)
6: handleErrors(returnStatus, conn)
7: stop(readString(conn))

══ testthat results 
═══
OK: 0 SKIPPED: 0 FAILED: 2
1. Error: create DataFrame from list or data.frame (@test_basic.R#26)
2. Error: spark.glm and predict (@test_basic.R#58)



-- Forwarded message -
Date: Mon, Nov 5, 2018, 10:12
Subject: [CRAN-pretest-archived] CRAN submission SparkR 2.4.0

Dear maintainer,

package SparkR_2.4.0.tar.gz does not pass the incoming checks automatically, 
please see the following pre-tests:
Windows: 

Status: 1 NOTE
Debian: 

Status: 1 WARNING, 1 NOTE

Last released version's CRAN status: ERROR: 1, OK: 1
See: 

CRAN Web: 

Please fix all problems and resubmit a fixed version via the webform.
If you are not sure how to fix the problems shown, please ask for help on the 
R-package-devel mailing list:

If you are fairly certain the rejection is a false positive, please reply-all 
to this message and explain.

More details are given in the directory:

The files will be removed after roughly 7 days.

No strong reverse dependencies to be checked.

Best regards,
CRAN teams' auto-check service
Flavor: r-devel-linux-x86_64-debian-gcc, r-devel-windows-ix86+x86_64
Check: CRAN incoming feasibility, Result: NOTE
  Maintainer: 'Shivaram Venkataraman 
mailto:shiva...@cs.berkeley.edu>>'

  New submission

  Package was archived on CRAN

  Possibly mis-spelled words in DESCRIPTION:
Frontend (4:10, 5:28)

  CRAN repository db overrides:
X-CRAN-Comment: Archived on 2018-05-01 as check problems were not
  corrected despite reminders.

Flavor: r-devel-linux-x86_64-debian-gcc
Check: re-building of vignette outputs, Result: WARNING
  Error in re-building vignettes:
...

  Attaching package: 'SparkR'

  The following objects are masked from 'package:stats':

  cov, filter, lag, na.omit, predict, sd, var, window

  The following objects are masked from 'package:base':

  as.data.frame, colnames, colnames<-, drop, endsWith,
  intersect, rank, rbind, sample, startsWith, subset, summary,
  transform, union

  trying URL 
'http://mirror.klaus-uwe.me/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz'
  Content type 'application/octet-stream' length 227893062 bytes (217.3 MB)
  ==
  downloaded 217.3 MB

  Quitting from lines 65-67 (sparkr-vignettes.Rmd)
  Error: processing vignette 'sparkr-vignettes.Rmd' failed with diagnostics:
  Java version 8 is required for this package; found version: 11.0.1
  Execution halted


[jira] [Commented] (SPARK-25923) SparkR UT Failure (checking CRAN incoming feasibility)

2018-11-03 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674216#comment-16674216
 ] 

Felix Cheung commented on SPARK-25923:
--

thanks - what's the exchange required with CRAN admin?

> SparkR UT Failure (checking CRAN incoming feasibility)
> --
>
> Key: SPARK-25923
> URL: https://issues.apache.org/jira/browse/SPARK-25923
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>
> Currently, the following SparkR error blocks PR builders.
> {code:java}
> * checking CRAN incoming feasibility ...Error in 
> .check_package_CRAN_incoming(pkgdir) : 
>   dims [product 26] do not match the length of object [0]
> Execution halted
> {code}
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98362/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98367/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98368/testReport/
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4403/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-01 Thread Felix Cheung
Thanks for being this up and much appreciate with keeping on top of this at
all times.

Can upgrading R able to fix the issue. Is this perhaps  not necessarily
malform but some new format for new versions perhaps? Anyway we should
consider upgrading R version if that fixes the problem.

As an option we could also disable the repo check in Jenkins but I can see
that could also be problematic.


On Thu, Nov 1, 2018 at 7:35 PM Hyukjin Kwon  wrote:

> Hi all,
>
> I want to raise the CRAN failure issue because it started to block Spark
> PRs time to time. Since the number
> of PRs grows hugely in Spark community, this is critical to not block
> other PRs.
>
> There has been a problem at CRAN (See
> https://github.com/apache/spark/pull/20005 for analysis).
> To cut it short, the root cause is malformed package info from
> https://cran.r-project.org/src/contrib/PACKAGES
> from server side, and this had to be fixed by requesting it to CRAN
> sysaadmin's help.
>
> https://issues.apache.org/jira/browse/SPARK-24152 <- newly open. I am
> pretty sure it's the same issue
> https://issues.apache.org/jira/browse/SPARK-25923 <- reopen/resolved 2
> times
> https://issues.apache.org/jira/browse/SPARK-22812
>
> This happened 5 times for roughly about 10 months, causing blocking almost
> all PRs in Apache Spark.
> Historically, it blocked whole PRs for few days once, and whole Spark
> community had to stop working.
>
> I assume this has been not a super big big issue so far for other projects
> or other people because apparently
> higher version of R has some logics to handle this malformed documents (at
> least I verified R 3.4.0 works fine).
>
> For our side, Jenkins has low R version (R 3.1.1 if that's not updated
> from what I have seen before),
> which is unable to parse the malformed server's response.
>
> So, I want to talk about how we are going to handle this. Possible
> solutions are:
>
> 1. We should start a talk with CRAN sysadmin to permanently prevent this
> issue
> 2. We upgrade R to 3.4.0 in Jenkins (however we will not be able to test
> low R versions)
> 3. ...
>
> If if we fine, I would like to suggest to forward this email to CRAN
> sysadmin to discuss further about this.
>
> Adding Liang-Chi Felix and Shivaram who I already talked about this few
> times before.
>
> Thanks all.
>
>
>
>


Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Felix Cheung
+1
Checked R doc and all R API changes



From: Denny Lee 
Sent: Wednesday, October 31, 2018 9:13 PM
To: Chitral Verma
Cc: Wenchen Fan; dev@spark.apache.org
Subject: Re: [VOTE] SPARK 2.4.0 (RC5)

+1

On Wed, Oct 31, 2018 at 12:54 PM Chitral Verma 
mailto:chitralve...@gmail.com>> wrote:
+1

On Wed, 31 Oct 2018 at 11:56, Reynold Xin 
mailto:r...@databricks.com>> wrote:
+1

Look forward to the release!



On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.0.

The vote is open until November 1 PST and passes if a majority +1 PMC votes are 
cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc5 (commit 
0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
https://github.com/apache/spark/tree/v2.4.0-rc5

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1291

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/

The list of bug fixes going into 2.4.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342385

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.4.0?
===

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


[jira] [Resolved] (SPARK-25859) add scala/java/python example and doc for PrefixSpan

2018-10-27 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25859.
--
  Resolution: Fixed
Assignee: Huaxin Gao
   Fix Version/s: 2.4.0
Target Version/s: 2.4.0

> add scala/java/python example and doc for PrefixSpan
> 
>
> Key: SPARK-25859
> URL: https://issues.apache.org/jira/browse/SPARK-25859
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 2.4.0
>
>
> scala/java/python examples and doc for PrefixSpan are added in 3.0 in 
> https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the 
> examples and doc in 2.4.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16693) Remove R deprecated methods

2018-10-27 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-16693.
--
   Resolution: Fixed
 Assignee: Felix Cheung
Fix Version/s: 3.0.0

> Remove R deprecated methods
> ---
>
> Key: SPARK-16693
> URL: https://issues.apache.org/jira/browse/SPARK-16693
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 3.0.0
>
>
> For methods deprecated in Spark 2.0.0, we should remove them in 2.1.0 -> 3.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs

2018-10-26 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665908#comment-16665908
 ] 

Felix Cheung commented on SPARK-12172:
--

sounds good

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>    Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15545) R remove non-exported unused methods, like jsonRDD

2018-10-25 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-15545.
--
Resolution: Duplicate

> R remove non-exported unused methods, like jsonRDD
> --
>
> Key: SPARK-15545
> URL: https://issues.apache.org/jira/browse/SPARK-15545
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Minor
>
> Need to review what should be removed.
> one reason to not remove this right away is because we have been talking 
> about calling internal methods via `SparkR:::jsonRDD` for this and other RDD 
> methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15545) R remove non-exported unused methods, like jsonRDD

2018-10-25 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-15545:
-
Affects Version/s: 2.3.2
External issue ID: SPARK-12172

> R remove non-exported unused methods, like jsonRDD
> --
>
> Key: SPARK-15545
> URL: https://issues.apache.org/jira/browse/SPARK-15545
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Minor
>
> Need to review what should be removed.
> one reason to not remove this right away is because we have been talking 
> about calling internal methods via `SparkR:::jsonRDD` for this and other RDD 
> methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12172) Consider removing SparkR internal RDD APIs

2018-10-25 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664631#comment-16664631
 ] 

Felix Cheung edited comment on SPARK-12172 at 10/26/18 4:11 AM:


ok, what's our option for spark.lapply?

I'll consider at least removing all other methods that are not used for 
spark.lapply in spark 3.0.0


was (Author: felixcheung):
ok, what's our option for spark.lapply?

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs

2018-10-25 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664631#comment-16664631
 ] 

Felix Cheung commented on SPARK-12172:
--

ok, what's our option for spark.lapply?

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2018-10-25 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664628#comment-16664628
 ] 

Felix Cheung commented on SPARK-16611:
--

ping - we are going to consider removing RDD methods in spark 3.0.0

> Expose several hidden DataFrame/RDD functions
> -
>
> Key: SPARK-16611
> URL: https://issues.apache.org/jira/browse/SPARK-16611
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>Priority: Major
>
> Expose the following functions:
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2018-10-25 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664630#comment-16664630
 ] 

Felix Cheung commented on SPARK-16611:
--

see SPARK-12172

> Expose several hidden DataFrame/RDD functions
> -
>
> Key: SPARK-16611
> URL: https://issues.apache.org/jira/browse/SPARK-16611
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>Priority: Major
>
> Expose the following functions:
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16693) Remove R deprecated methods

2018-10-25 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664626#comment-16664626
 ] 

Felix Cheung commented on SPARK-16693:
--

rebuilt this on spark 3.0.0

> Remove R deprecated methods
> ---
>
> Key: SPARK-16693
> URL: https://issues.apache.org/jira/browse/SPARK-16693
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> For methods deprecated in Spark 2.0.0, we should remove them in 2.1.0 -> 3.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16693) Remove R deprecated methods

2018-10-25 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-16693:
-
Description: For methods deprecated in Spark 2.0.0, we should remove them 
in 2.1.0 -> 3.0.0  (was: For methods deprecated in Spark 2.0.0, we should 
remove them in 2.1.0)

> Remove R deprecated methods
> ---
>
> Key: SPARK-16693
> URL: https://issues.apache.org/jira/browse/SPARK-16693
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>    Reporter: Felix Cheung
>Priority: Major
>
> For methods deprecated in Spark 2.0.0, we should remove them in 2.1.0 -> 3.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: DataSourceV2 hangouts sync

2018-10-25 Thread Felix Cheung
Yes please!



From: Ryan Blue 
Sent: Thursday, October 25, 2018 1:10 PM
To: Spark Dev List
Subject: DataSourceV2 hangouts sync

Hi everyone,

There's been some great discussion for DataSourceV2 in the last few months, but 
it has been difficult to resolve some of the discussions and I don't think that 
we have a very clear roadmap for getting the work done.

To coordinate better as a community, I'd like to start a regular sync-up over 
google hangouts. We use this in the Parquet community to have more effective 
community discussions about thorny technical issues and to get aligned on an 
overall roadmap. It is really helpful in that community and I think it would 
help us get DSv2 done more quickly.

Here's how it works: people join the hangout, we go around the list to gather 
topics, have about an hour-long discussion, and then send a summary of the 
discussion to the dev list for anyone that couldn't participate. That way we 
can move topics along, but we keep the broader community in the loop as well 
for further discussion on the mailing list.

I'll volunteer to set up the sync and send invites to anyone that wants to 
attend. If you're interested, please reply with the email address you'd like to 
put on the invite list (if there's a way to do this without specific invites, 
let me know). Also for the first sync, please note what times would work for 
you so we can try to account for people in different time zones.

For the first one, I was thinking some day next week (time TBD by those 
interested) and starting off with a general roadmap discussion before diving 
into specific technical topics.

Thanks,

rb

--
Ryan Blue
Software Engineer
Netflix


[jira] [Resolved] (SPARK-24572) "eager execution" for R shell, IDE

2018-10-24 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-24572.
--
  Resolution: Fixed
Assignee: Weiqiang Zhuang
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0

> "eager execution" for R shell, IDE
> --
>
> Key: SPARK-24572
> URL: https://issues.apache.org/jira/browse/SPARK-24572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Weiqiang Zhuang
>Priority: Major
> Fix For: 3.0.0
>
>
> like python in SPARK-24215
> we could also have eager execution when SparkDataFrame is returned to the R 
> shell



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24516) PySpark Bindings for K8S - make Python 3 the default

2018-10-24 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-24516.
--
  Resolution: Fixed
Assignee: Ilan Filonenko
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0

> PySpark Bindings for K8S - make Python 3 the default
> 
>
> Key: SPARK-24516
> URL: https://issues.apache.org/jira/browse/SPARK-24516
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.0
>Reporter: Ondrej Kokes
>Assignee: Ilan Filonenko
>Priority: Minor
> Fix For: 3.0.0
>
>
> Initial PySpark-k8s bindings have just been resolved (SPARK-23984), but the 
> default Python version there is 2. While you can override this by setting it 
> to 3, I think we should have sensible defaults.
> Python 3 has been around for ten years and is the clear successor, Python 2 
> has only 18 months left in terms of support. There isn't a good reason to 
> suggest Python 2 should be used, not in 2018 and not when both versions are 
> supported.
> The relevant commit [is 
> here|https://github.com/apache/spark/commit/1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8R194],
>  the version is also [in the 
> documentation|https://github.com/apache/spark/commit/1a644afbac35c204f9ad55f86999319a9ab458c6#diff-b5527f236b253e0d9f5db5164bdb43e9R643].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-10-21 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658377#comment-16658377
 ] 

Felix Cheung edited comment on SPARK-22947 at 10/21/18 8:53 PM:


so what's our take on this? it seems quite useful for time series analysis 
which would be quite important for us.

first, seems like there is a question of the syntax - either AS OF or streaming 
INTERVAL or some sort of join hint

second, how the optimizer can figure this out.

perhaps the first is not a strict prereq for the second, but they are closely 
related. are we considering splitting this proposal into two and focus on 
getting optimizer figuring this out first, perhaps?


was (Author: felixcheung):
so what's our take on this? it seems quite useful for time series analysis 
which would be quite important for us.

first, seems like there is a question of the syntax - either AS OF or streaming 
INTERVAL or some sort of join hint

second, how the optimizer can figure this out.

perhaps the first is not a strict prereq for the second, but they are closely 
related

> SPIP: as-of join in Spark SQL
> -
>
> Key: SPARK-22947
> URL: https://issues.apache.org/jira/browse/SPARK-22947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Li Jin
>Priority: Major
> Attachments: SPIP_ as-of join in Spark SQL (1).pdf
>
>
> h2. Background and Motivation
> Time series analysis is one of the most common analysis on financial data. In 
> time series analysis, as-of join is a very common operation. Supporting as-of 
> join in Spark SQL will allow many use cases of using Spark SQL for time 
> series analysis.
> As-of join is “join on time” with inexact time matching criteria. Various 
> library has implemented asof join or similar functionality:
> Kdb: https://code.kx.com/wiki/Reference/aj
> Pandas: 
> http://pandas.pydata.org/pandas-docs/version/0.19.0/merging.html#merging-merge-asof
> R: This functionality is called “Last Observation Carried Forward”
> https://www.rdocumentation.org/packages/zoo/versions/1.8-0/topics/na.locf
> JuliaDB: http://juliadb.org/latest/api/joins.html#IndexedTables.asofjoin
> Flint: https://github.com/twosigma/flint#temporal-join-functions
> This proposal advocates introducing new API in Spark SQL to support as-of 
> join.
> h2. Target Personas
> Data scientists, data engineers
> h2. Goals
> * New API in Spark SQL that allows as-of join
> * As-of join of multiple table (>2) should be performant, because it’s very 
> common that users need to join multiple data sources together for further 
> analysis.
> * Define Distribution, Partitioning and shuffle strategy for ordered time 
> series data
> h2. Non-Goals
> These are out of scope for the existing SPIP, should be considered in future 
> SPIP as improvement to Spark’s time series analysis ability:
> * Utilize partition information from data source, i.e, begin/end of each 
> partition to reduce sorting/shuffling
> * Define API for user to implement asof join time spec in business calendar 
> (i.e. lookback one business day, this is very common in financial data 
> analysis because of market calendars)
> * Support broadcast join
> h2. Proposed API Changes
> h3. TimeContext
> TimeContext is an object that defines the time scope of the analysis, it has 
> begin time (inclusive) and end time (exclusive). User should be able to 
> change the time scope of the analysis (i.e, from one month to five year) by 
> just changing the TimeContext. 
> To Spark engine, TimeContext is a hint that:
> can be used to repartition data for join
> serve as a predicate that can be pushed down to storage layer
> Time context is similar to filtering time by begin/end, the main difference 
> is that time context can be expanded based on the operation taken (see 
> example in as-of join).
> Time context example:
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> {code}
> h3. asofJoin
> h4. User Case A (join without key)
> Join two DataFrames on time, with one day lookback:
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> dfA = ...
> dfB = ...
> JoinSpec joinSpec = JoinSpec(timeContext).on("time").tolerance("-1day")
> result = dfA.asofJoin(dfB, joinSpec)
> {code}
> Example input/output:
> {code:java}
> dfA:
> time, quantity
> 20160101, 100
> 20160102, 50
> 20160104, -50
> 20160105, 100
> dfB:
> time, price
> 2015123

[jira] [Commented] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-10-21 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658377#comment-16658377
 ] 

Felix Cheung commented on SPARK-22947:
--

so what's our take on this? it seems quite useful for time series analysis 
which would be quite important for us.

first, seems like there is a question of the syntax - either AS OF or streaming 
INTERVAL or some sort of join hint

second, how the optimizer can figure this out.

perhaps the first is not a strict prereq for the second, but they are closely 
related

> SPIP: as-of join in Spark SQL
> -
>
> Key: SPARK-22947
> URL: https://issues.apache.org/jira/browse/SPARK-22947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Li Jin
>Priority: Major
> Attachments: SPIP_ as-of join in Spark SQL (1).pdf
>
>
> h2. Background and Motivation
> Time series analysis is one of the most common analysis on financial data. In 
> time series analysis, as-of join is a very common operation. Supporting as-of 
> join in Spark SQL will allow many use cases of using Spark SQL for time 
> series analysis.
> As-of join is “join on time” with inexact time matching criteria. Various 
> library has implemented asof join or similar functionality:
> Kdb: https://code.kx.com/wiki/Reference/aj
> Pandas: 
> http://pandas.pydata.org/pandas-docs/version/0.19.0/merging.html#merging-merge-asof
> R: This functionality is called “Last Observation Carried Forward”
> https://www.rdocumentation.org/packages/zoo/versions/1.8-0/topics/na.locf
> JuliaDB: http://juliadb.org/latest/api/joins.html#IndexedTables.asofjoin
> Flint: https://github.com/twosigma/flint#temporal-join-functions
> This proposal advocates introducing new API in Spark SQL to support as-of 
> join.
> h2. Target Personas
> Data scientists, data engineers
> h2. Goals
> * New API in Spark SQL that allows as-of join
> * As-of join of multiple table (>2) should be performant, because it’s very 
> common that users need to join multiple data sources together for further 
> analysis.
> * Define Distribution, Partitioning and shuffle strategy for ordered time 
> series data
> h2. Non-Goals
> These are out of scope for the existing SPIP, should be considered in future 
> SPIP as improvement to Spark’s time series analysis ability:
> * Utilize partition information from data source, i.e, begin/end of each 
> partition to reduce sorting/shuffling
> * Define API for user to implement asof join time spec in business calendar 
> (i.e. lookback one business day, this is very common in financial data 
> analysis because of market calendars)
> * Support broadcast join
> h2. Proposed API Changes
> h3. TimeContext
> TimeContext is an object that defines the time scope of the analysis, it has 
> begin time (inclusive) and end time (exclusive). User should be able to 
> change the time scope of the analysis (i.e, from one month to five year) by 
> just changing the TimeContext. 
> To Spark engine, TimeContext is a hint that:
> can be used to repartition data for join
> serve as a predicate that can be pushed down to storage layer
> Time context is similar to filtering time by begin/end, the main difference 
> is that time context can be expanded based on the operation taken (see 
> example in as-of join).
> Time context example:
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> {code}
> h3. asofJoin
> h4. User Case A (join without key)
> Join two DataFrames on time, with one day lookback:
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> dfA = ...
> dfB = ...
> JoinSpec joinSpec = JoinSpec(timeContext).on("time").tolerance("-1day")
> result = dfA.asofJoin(dfB, joinSpec)
> {code}
> Example input/output:
> {code:java}
> dfA:
> time, quantity
> 20160101, 100
> 20160102, 50
> 20160104, -50
> 20160105, 100
> dfB:
> time, price
> 20151231, 100.0
> 20160104, 105.0
> 20160105, 102.0
> output:
> time, quantity, price
> 20160101, 100, 100.0
> 20160102, 50, null
> 20160104, -50, 105.0
> 20160105, 100, 102.0
> {code}
> Note row (20160101, 100) of dfA is joined with (20151231, 100.0) of dfB. This 
> is an important illustration of the time context - it is able to expand the 
> context to 20151231 on dfB because of the 1 day lookback.
> h4. Use Case B (join with key)
> To join on time and another key (for instance, id), we use “by” to specify 
> the key.
> {cod

[jira] [Resolved] (SPARK-24207) PrefixSpan: R API

2018-10-21 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-24207.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> PrefixSpan: R API
> -
>
> Key: SPARK-24207
> URL: https://issues.apache.org/jira/browse/SPARK-24207
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24207) PrefixSpan: R API

2018-10-21 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-24207:


Assignee: Huaxin Gao

> PrefixSpan: R API
> -
>
> Key: SPARK-24207
> URL: https://issues.apache.org/jira/browse/SPARK-24207
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25634) New Metrics in External Shuffle Service to help identify abusing application

2018-10-21 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658343#comment-16658343
 ] 

Felix Cheung commented on SPARK-25634:
--

how about off-heap and netty buffer usage?

> New Metrics in External Shuffle Service to help identify abusing application
> 
>
> Key: SPARK-25634
> URL: https://issues.apache.org/jira/browse/SPARK-25634
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Ye Zhou
>Priority: Minor
>
> We run Spark on YARN, and deploy Spark external shuffle service as part of 
> YARN NM aux service. External Shuffle Service is shared by all Spark 
> applications. SPARK-24355 enables the threads reservation to handle 
> non-ChunkFetchRequest. SPARK-21501 limits the memory usage for Guava Cache to 
> avoid OOM in shuffle service which could crash NodeManager. But still some 
> application may generate a large amount of shuffle blocks which could heavily 
> decrease the performance on some shuffle servers. When this abusing behavior 
> happens, it might further decreases the overall performance for other 
> applications if they happen to use the same shuffle servers. We have been 
> seeing issues like this in our cluster, but there is no way for us to figure 
> out which application is abusing shuffle service.
> SPARK-18364 has enabled expose out shuffle service metrics to Hadoop Metrics 
> System. It is better if we can have the following metrics and also metrics 
> divided by applicationID:
> 1. *shuffle server on-heap memory consumption for caching shuffle indexes*
> 2. *breakdown of shuffle indexes caching memory consumption by local 
> executors*
> We can generate metrics when 
> ExternalShuffleBlockHandler-->getSortBasedShuffleBlockData, which will 
> trigger the Cache load. We can roughly be able to get the metrics from the 
> shuffleindexfile size when putting into the cache and moved out from the 
> cache.
> 3. *shuffle server load for shuffle block fetch requests*
> 4. *breakdown of shuffle server block fetch requests load by remote executors*
> We can generate metrics in ExternalShuffleBlockHandler-->handleMessage when a 
> new OpenBlocks message is received.
> Open discussion for more metrics that could potentially influence the overall 
> shuffle service performance. 
> We can print out those metrics which are divided by applicationIDs in log, 
> since it is hard to define fixed key and use numerical value for this kind of 
> metrics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25675) [Spark Job History] Job UI page does not show pagination with one page

2018-10-21 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25675.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> [Spark Job History] Job UI page does not show pagination with one page
> --
>
> Key: SPARK-25675
> URL: https://issues.apache.org/jira/browse/SPARK-25675
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Shivu Sondur
>Priority: Major
> Fix For: 3.0.0
>
>
> 1. set spark.ui.retainedJobs= 1 in spark-default conf of spark Job History
>  2. Restart Job History
>  3. Submit Beeline jobs for 1
>  4. Launch Job History UI Page
>  5. Select JDBC Running Application ID from Incomplete Application Page
>  6. Launch Jo Page
>  7. Pagination Panel display based on page size as below
>  
> 
>  Completed Jobs XXX
>  Page: 1 2 3 ... XX Page: Jump to 1 show 100 items in a 
> page
>  
> -
>  8. Change the value in Jump to 1 show *XXX* items in page, that is display 
> all completed Jobs in a single page
> *Actual Result:*
>  All completed Jobs will be display in a Page but no Pagination panel so that 
> User can modify and set the number of Jobs in a page.
> *Expected Result:*
>  It should display the Pagination panel as below
>  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>  Page: 1                                                             1 Page: 
> Jump to 1 show *XXX* items in a page
>  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>  Pagination of page size *1* because it is displaying total number of 
> completed Jobs in a single Page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25675) [Spark Job History] Job UI page does not show pagination with one page

2018-10-21 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-25675:


Assignee: Shivu Sondur

> [Spark Job History] Job UI page does not show pagination with one page
> --
>
> Key: SPARK-25675
> URL: https://issues.apache.org/jira/browse/SPARK-25675
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Shivu Sondur
>Priority: Major
> Fix For: 3.0.0
>
>
> 1. set spark.ui.retainedJobs= 1 in spark-default conf of spark Job History
>  2. Restart Job History
>  3. Submit Beeline jobs for 1
>  4. Launch Job History UI Page
>  5. Select JDBC Running Application ID from Incomplete Application Page
>  6. Launch Jo Page
>  7. Pagination Panel display based on page size as below
>  
> 
>  Completed Jobs XXX
>  Page: 1 2 3 ... XX Page: Jump to 1 show 100 items in a 
> page
>  
> -
>  8. Change the value in Jump to 1 show *XXX* items in page, that is display 
> all completed Jobs in a single page
> *Actual Result:*
>  All completed Jobs will be display in a Page but no Pagination panel so that 
> User can modify and set the number of Jobs in a page.
> *Expected Result:*
>  It should display the Pagination panel as below
>  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>  Page: 1                                                             1 Page: 
> Jump to 1 show *XXX* items in a page
>  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>  Pagination of page size *1* because it is displaying total number of 
> completed Jobs in a single Page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25730) Kubernetes scheduler tries to read pod details that it just deleted

2018-10-21 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25730:
-
Affects Version/s: (was: 2.5.0)

> Kubernetes scheduler tries to read pod details that it just deleted
> ---
>
> Key: SPARK-25730
> URL: https://issues.apache.org/jira/browse/SPARK-25730
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Major
> Fix For: 3.0.0
>
>
> See [https://github.com/apache/spark/pull/22720/files] for the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25730) Kubernetes scheduler tries to read pod details that it just deleted

2018-10-21 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-25730:


Assignee: Mike Kaplinskiy

> Kubernetes scheduler tries to read pod details that it just deleted
> ---
>
> Key: SPARK-25730
> URL: https://issues.apache.org/jira/browse/SPARK-25730
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Major
> Fix For: 3.0.0
>
>
> See [https://github.com/apache/spark/pull/22720/files] for the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25730) Kubernetes scheduler tries to read pod details that it just deleted

2018-10-21 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25730.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> Kubernetes scheduler tries to read pod details that it just deleted
> ---
>
> Key: SPARK-25730
> URL: https://issues.apache.org/jira/browse/SPARK-25730
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Major
> Fix For: 3.0.0
>
>
> See [https://github.com/apache/spark/pull/22720/files] for the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Zeppelin add hadoop submarine(machine learning framework) interpreter

2018-10-20 Thread Felix Cheung
Very cool!



From: Jeff Zhang 
Sent: Friday, October 19, 2018 7:14 AM
To: dev@zeppelin.apache.org
Subject: Re: Zeppelin add hadoop submarine(machine learning framework) 
interpreter

Thanks xun. This would be a great addon for zeppelin to support deep
learning. I will check the design later.



liu xun 于2018年10月19日周五 下午3:56写道:

> Hi,
> Hadoop Submarine is the latest machine learning framework subproject in
> the Hadoop 3.2 release. It allows Hadoop to support Tensorflow,
> MXNet,Caffe, Spark, etc. A variety of deep learning frameworks provide a
> full-featured system framework for machine learning algorithm development,
> distributed model training, model management, and model publishing,
> combined with hadoop's intrinsic data storage and data processing
> capabilities to enable data scientists to Good mining and the value of the
> data.
>
>
> I was involved in the development of the hadoop submarine project. So I
> plan to add the interpreter module of hadoop submarine to zeppelin.
> Let zeppeline increase the development of deep learning. This is my design
> document, let's see if there is any opinion, you can put it directly in the
> document, thank you!
>
>
> https://docs.google.com/document/d/16YN8Kjmxt1Ym3clx5pDnGNXGajUT36hzQxjaik1cP4A/edit?ts=5bc6bfdd
>
>
>


Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Felix Cheung
I’m in favor of it. If you check the PR it’s a few isolated script changes and 
all test-only changes. Should have low impact on release but much better 
integration test coverage.



From: Erik Erlandson 
Sent: Tuesday, October 16, 2018 8:20 AM
To: dev
Subject: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

I'd like to propose including integration testing for Kerberos on the Spark 2.4 
release:
https://github.com/apache/spark/pull/22608

Arguments in favor:
1) it improves testing coverage on a feature important for integrating with 
HDFS deployments
2) its intersection with existing code is small - it consists primarily of new 
testing code, with a bit of refactoring into 'main' and 'test' sub-trees. These 
new tests appear stable.
3) Spark 2.4 is still in RC, with outstanding correctness issues.

The argument 'against' that I'm aware of would be the relatively large size of 
the PR. I believe this is considered above, but am soliciting community 
feedback before committing.
Cheers,
Erik



Re: SparkR issue

2018-10-14 Thread Felix Cheung
1 seems like its spending a lot of time in R (slicing the data I guess?) and 
not with Spark
2 could you write it into a csv file locally and then read it from Spark?



From: ayan guha 
Sent: Monday, October 8, 2018 11:21 PM
To: user
Subject: SparkR issue

Hi

We are seeing some weird behaviour in Spark R.

We created a R Dataframe with 600K records and 29 columns. Then we tried to 
convert R DF to SparkDF using

df <- SparkR::createDataFrame(rdf)

from RStudio. It hanged, we had to kill the process after 1-2 hours.

We also tried following:
df <- SparkR::createDataFrame(rdf, numPartition=4000)
df <- SparkR::createDataFrame(rdf, numPartition=300)
df <- SparkR::createDataFrame(rdf, numPartition=10)

Same result. Both scenarios seems RStudio is working and no trace of jobs in 
Spark Application Master view.

Finally, we used this:

df <- SparkR::createDataFrame(rdf, schema=schema) , schema is a StructType.

This tool 25 mins to create the spark DF. However job did show up in 
Application Master view and it shows only 20-30 secs. Then where did rest of 
the time go?

Question:
1. Is this expected behavior? (I hope not). How should we speed up this bit?
2. We understand better options would be to read data from external sources, 
but we need this data to be generated for some simulation purpose. Whats 
possibly going wrong?


Best
Ayan



--
Best Regards,
Ayan Guha


Re: [DISCUSS][K8S] Local dependencies with Kubernetes

2018-10-07 Thread Felix Cheung
Jars and libraries only accessible locally at the driver is fairly limited? 
Don’t you want the same on all executor?




From: Yinan Li 
Sent: Friday, October 5, 2018 11:25 AM
To: Stavros Kontopoulos
Cc: rve...@dotnetrdf.org; dev
Subject: Re: [DISCUSS][K8S] Local dependencies with Kubernetes

> Just to be clear: in client mode things work right? (Although I'm not
really familiar with how client mode works in k8s - never tried it.)

If the driver runs on the submission client machine, yes, it should just work. 
If the driver runs in a pod, however, it faces the same problem as in cluster 
mode.

Yinan

On Fri, Oct 5, 2018 at 11:06 AM Stavros Kontopoulos 
mailto:stavros.kontopou...@lightbend.com>> 
wrote:
@Marcelo is correct. Mesos does not have something similar. Only Yarn does due 
to the distributed cache thing.
I have described most of the above in the the jira also there are some other 
options.

Best,
Stavros

On Fri, Oct 5, 2018 at 8:28 PM, Marcelo Vanzin 
mailto:van...@cloudera.com.invalid>> wrote:
On Fri, Oct 5, 2018 at 7:54 AM Rob Vesse 
mailto:rve...@dotnetrdf.org>> wrote:
> Ideally this would all just be handled automatically for users in the way 
> that all other resource managers do

I think you're giving other resource managers too much credit. In
cluster mode, only YARN really distributes local dependencies, because
YARN has that feature (its distributed cache) and Spark just uses it.

Standalone doesn't do it (see SPARK-4160) and I don't remember seeing
anything similar on the Mesos side.

There are things that could be done; e.g. if you have HDFS you could
do a restricted version of what YARN does (upload files to HDFS, and
change the "spark.jars" and "spark.files" URLs to point to HDFS
instead). Or you could turn the submission client into a file server
that the cluster-mode driver downloads files from - although that
requires connectivity from the driver back to the client.

Neither is great, but better than not having that feature.

Just to be clear: in client mode things work right? (Although I'm not
really familiar with how client mode works in k8s - never tried it.)

--
Marcelo

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org






Re: Spark SQL parser and DDL

2018-10-07 Thread Felix Cheung
Sounds like a good idea?

Would this be a step in the direction of supporting variation of the SQL 
dialect, too?



From: Ryan Blue 
Sent: Thursday, October 4, 2018 8:56 AM
To: Spark Dev List
Subject: Spark SQL parser and DDL


Hi everyone,

I’ve been working on SQL DDL statements for v2 tables lately, including the 
proposed additions to drop, rename, and alter columns. The most recent update 
I’ve added is to allow transformation functions in the PARTITION BY clause to 
pass to v2 data sources. This allows sources like Iceberg to do partition 
pruning internally.

One of the difficulties has been that the SQL parser is coupled to the current 
logical plans and includes details that are specific to them. For example, data 
source table creation makes determinations like the EXTERNAL keyword is not 
allowed and instead the mode (external or managed) is set depending on whether 
a path is set. It also translates IF NOT EXISTS into a SaveMode and introduces 
a few other transformations.

The main problem with this is that converting the SQL plans produced by the 
parser to v2 plans requires interpreting these alterations and not the original 
SQL. Another consequence is that there are two parsers: AstBuilder in 
spark-catalyst and SparkSqlParser in spark-sql (core) because not all of the 
plans are available to the parser in the catalyst module.

I think it would be cleaner if we added a sql package with catalyst plans that 
carry the SQL options as they were parsed, and then convert those plans to 
specific implementations depending on the tables that are used. That makes 
support for v2 plans much cleaner by converting from a generic SQL plan instead 
of creating a v1 plan that assumes a data source table and then converting that 
to a v2 plan (playing telephone with logical plans).

This has simplified the work I’ve been doing to add PARTITION BY 
transformations. Instead of needing to add transformations to the CatalogTable 
metadata that’s used everywhere, this only required a change to the rule that 
converts from the parsed SQL plan to CatalogTable-based v1 plans. It is also 
cleaner to have the logic for converting to CatalogTable in DataSourceAnalysis 
instead of in the parser itself.

Are there objections to this approach for integrating v2 plans?

--
Ryan Blue
Software Engineer
Netflix


Re: [DISCUSS] Syntax for table DDL

2018-10-02 Thread Felix Cheung
I think it has been an important “selling point” that Spark is “mostly 
compatible“ with Hive DDL.

I have see a lot of teams suffering from switching between Presto and Hive 
dialects.

So one question I have is, we are at a point of switch from Hive compatible to 
ANSI SQL, say?

Perhaps a more critical question, what does it take to get the platform to 
support both, by making the ANTLR extensible?




From: Alessandro Solimando 
Sent: Tuesday, October 2, 2018 12:35 AM
To: rb...@netflix.com
Cc: Xiao Li; dev
Subject: Re: [DISCUSS] Syntax for table DDL

I agree with Ryan, a "standard" and more widely adopted syntax is usually a 
good idea, with possibly some slight improvements like "bulk deletion" of 
columns (especially because both the syntax and the semantics are clear), 
rather than stay with Hive syntax at any cost.

I am personally following this PR with a lot of interest, thanks for all the 
work along this direction.

Best regards,
Alessandro

On Mon, 1 Oct 2018 at 20:21, Ryan Blue  wrote:

What do you mean by consistent with the syntax in SqlBase.g4? These aren’t 
currently defined, so we need to decide what syntax to support. There are more 
details below, but the syntax I’m proposing is more standard across databases 
than Hive, which uses confusing and non-standard syntax.

I doubt that we want to support Hive syntax for a few reasons. Hive uses the 
same column CHANGE statement for multiple purposes, so it ends up with strange 
patterns for simple tasks, like updating the column’s type:

ALTER TABLE t CHANGE a1 a1 INT;


The column name is doubled because old name, new name, and type are always 
required. So you have to know the type of a column to change its name and you 
have to double up the name to change its type. Hive also allows a couple other 
oddities:

  *   Column reordering with FIRST and AFTER keywords. Column reordering is 
tricky to get right so I’m not sure we want to add it.
  *   RESTRICT and CASCADE to signal whether to change all partitions or not. 
Spark doesn’t support partition-level schemas except through Hive, and even 
then I’m not sure how reliable it is.

I know that we wouldn’t necessarily have to support these features from Hive, 
but I’m pointing them out to ask the question: why copy Hive’s syntax if it is 
unlikely that Spark will implement all of the “features”? I’d rather go with 
SQL syntax from databases like PostgreSQL or others that are more standard and 
common.

The more “standard” versions of these statements are like what I’ve proposed:

  *   ALTER TABLE ident ALTER COLUMN qualifiedName TYPE dataType: ALTER is used 
by SQL Server, Access, DB2, and PostgreSQL; MODIFY by MySQL and Oracle. COLUMN 
is optional in Oracle and TYPE is omitted by databases other than PosgreSQL. I 
think we could easily add MODIFY as an alternative to the second ALTER (and 
maybe alternatives like UPDATE and CHANGE) and make both TYPE and COLUMN 
optional.
  *   ALTER TABLE ident RENAME COLUMN qualifiedName TO qualifiedName: This 
syntax is supported by PostgreSQL, Oracle, and DB2. MySQL uses the same syntax 
as Hive and it appears that SQL server doesn’t have this statement. This also 
match the table rename syntax, which uses TO.
  *   ALTER TABLE ident DROP (COLUMN | COLUMNS) qualifiedNameList: This matches 
PostgreSQL, Oracle, DB2, and SQL server. MySQL makes COLUMN optional. Most 
don’t allow deleting multiple columns, but it’s a reasonable extension.

While we’re on the subject of ALTER TABLE DDL, I should note that all of the 
databases use ADD COLUMN syntax that differs from Hive (and currently, Spark):

  *   ALTER TABLE ident ADD COLUMN qualifiedName dataType (',' qualifiedName 
dataType)*: All other databases I looked at use ADD COLUMN, but not all of them 
support adding multiple columns at the same time. Hive requires ( and ) 
enclosing the columns and uses the COLUMNS keyword instead of COLUMN. I think 
that Spark should be updated to make the parens optional and to support both 
keywords, COLUMN and COLUMNS.

What does everyone think? Is it reasonable to use the more standard syntax 
instead of using Hive as a base?

rb

On Fri, Sep 28, 2018 at 11:07 PM Xiao Li 
mailto:gatorsm...@gmail.com>> wrote:
Are they consistent with the current syntax defined in SqlBase.g4? I think we 
are following the Hive DDL syntax: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column

Ryan Blue  于2018年9月28日周五 下午3:47写道:

Hi everyone,

I’m currently working on new table DDL statements for v2 tables. For context, 
the new logical plans for DataSourceV2 require a catalog interface so that 
Spark can create tables for operations like CTAS. The proposed TableCatalog API 
also includes an API for altering those tables so we can make ALTER TABLE 
statements work. I’m implementing those DDL statements, which will make it into 
upstream Spark when the TableCatalog PR is merged.

Since I’m adding ne

Re: On Scala 2.12.7

2018-10-01 Thread Felix Cheung
Although like you said, spark support for scala 2.12 is beta anyway then 
shouldn’t we get it to a working state by basing on 2.12.7? There shouldn’t be 
a stability issue since it is not officially “supported”




From: Wenchen Fan 
Sent: Monday, October 1, 2018 7:43 PM
To: Sean Owen
Cc: sad...@zoho.com; Spark dev list
Subject: Re: On Scala 2.12.7

My major concern is how it will affect end-users if Spark 2.4 is built with 
Scala versions prior to 2.12.7. Generally I'm hesitating to upgrade Scala 
version when we are very close to a release, and Scala 2.12 build of Spark 2.4 
is beta anyway.

On Sat, Sep 29, 2018 at 6:46 AM Sean Owen 
mailto:sro...@apache.org>> wrote:
I'm forking the discussion about Scala 2.12.7 from the 2.4.0 RC vote thread.

2.12.7 was released yesterday, and, is even labeled as fixing Spark
2.4 compatibility! https://www.scala-lang.org/news/2.12.7 We should
look into it, yes.

Darcy identified, and they fixed, this issue:
https://github.com/scala/scala/pull/7156 while finishing the work for
Scala 2.12.

However we already worked around this in Spark, no? at
https://github.com/apache/spark/commit/f29c2b5287563c0d6f55f936bd5a75707d7b2b1f

So we should go ahead and update to use 2.12.7, yes, and undo this workaround?
But this doesn't necessarily block a 2.4.0 release, if it's already
worked around.

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



Re: can Spark 2.4 work on JDK 11?

2018-09-29 Thread Felix Cheung
Not officially. We have seen problem with JDK 10 as well. It will be great if 
you or someone would like to contribute to get it to work..



From: kant kodali 
Sent: Tuesday, September 25, 2018 2:31 PM
To: user @spark
Subject: can Spark 2.4 work on JDK 11?

Hi All,

can Spark 2.4 work on JDK 11? I feel like there are lot of features that are 
added in JDK 9, 10, 11 that can make deployment process a whole lot better and 
of course some more syntax sugar similar to Scala.

Thanks!


[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Description: 
follow up to SPARK-24255

from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
requirements as running tests - we have seen cases where SparkR is run on Java 
10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
skipping all tests

  was:
follow up to SPARK-24255

from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
requirements as running tests - we have seen cases where SparkR is run on Java 
10, which unfortunately Spark does not start on. For 2.4, lets attempt skipping 
all tests


> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16633129#comment-16633129
 ] 

Felix Cheung commented on SPARK-25572:
--

[~cloud_fan] while not a blocker, it would be great to include in 2.4.0 if we 
have another RC

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16633130#comment-16633130
 ] 

Felix Cheung commented on SPARK-25572:
--

[~shivaram]

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >