Re: Cutting the RC for Spark 2.2.1 release

2017-11-13 Thread Felix Cheung
Quick update:

We merged 6 fixes Friday and 7 fixes today (thanks!), since some are 
hand-merged I’m waiting for clean builds from Jenkins and test passes. As of 
now it looks like we need to take one more fix for Scala 2.10.

With any luck we should be tagging for build tomorrow morning (PT).

There should not be any issue targetting 2.2.1 except for SPARK-22042. As it is 
not a regression and it seems it might take a while, we won’t be blocking the 
release.

_
From: Felix Cheung >
Sent: Wednesday, November 8, 2017 3:57 PM
Subject: Cutting the RC for Spark 2.2.1 release
To: >


Hi!

As we are closing down on the few known issues I think we are ready to tag and 
cut the 2.2.1 release.

If you are aware of any issue that you think should go into this release please 
feel free to ping me and mark the JIRA as targeting 2.2.1. I will be scrubbing 
JIRA in the next few days.

So unless we hear otherwise, I’m going to tag and build the RC starting 
Saturday EOD (PT). Please be patient since I’m going to be new at this :) but 
will keep the dev@ posted for any update.

Yours
RM for 2.2.1






Re: Timeline for Spark 2.3

2017-11-13 Thread dji...@dataxu.com
Hi,

What is the process to request an issue/fix to be included in the next
release? Is there a place to vote for features?
I am interested in https://issues.apache.org/jira/browse/SPARK-13127, to see
if we can get Spark upgrade parquet to 1.9.0, which addresses the
https://issues.apache.org/jira/browse/PARQUET-686. 
Can we include the fix in Spark 2.3 release?

Thanks,

Dong



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Cutting the RC for Spark 2.2.1 release

2017-11-13 Thread Felix Cheung
Anything to build with maven on a clean machine.
It couldn’t connect to maven central repo.


From: Holden Karau 
Sent: Monday, November 13, 2017 10:38:03 AM
To: Felix Cheung
Cc: dev@spark.apache.org
Subject: Re: Cutting the RC for Spark 2.2.1 release

Which script is this from?

On Mon, Nov 13, 2017 at 10:37 AM Felix Cheung 
> wrote:
Build/test looks good but I’m hitting a new issue with sonatype when tagging

"Host name 'repo1.maven.org' does not match the 
certificate subject provided by the peer 
(CN=repo.maven.apache.org, O="Sonatype, Inc", 
L=Fulton, ST=MD, C=US)"

https://issues.sonatype.org/browse/MVNCENTRAL-1369

Stay tuned.


From: Felix Cheung >
Sent: Monday, November 13, 2017 12:00:41 AM
To: dev@spark.apache.org
Subject: Re: Cutting the RC for Spark 2.2.1 release

Quick update:

We merged 6 fixes Friday and 7 fixes today (thanks!), since some are 
hand-merged I’m waiting for clean builds from Jenkins and test passes. As of 
now it looks like we need to take one more fix for Scala 2.10.

With any luck we should be tagging for build tomorrow morning (PT).

There should not be any issue targetting 2.2.1 except for SPARK-22042. As it is 
not a regression and it seems it might take a while, we won’t be blocking the 
release.

_
From: Felix Cheung >
Sent: Wednesday, November 8, 2017 3:57 PM
Subject: Cutting the RC for Spark 2.2.1 release
To: >


Hi!

As we are closing down on the few known issues I think we are ready to tag and 
cut the 2.2.1 release.

If you are aware of any issue that you think should go into this release please 
feel free to ping me and mark the JIRA as targeting 2.2.1. I will be scrubbing 
JIRA in the next few days.

So unless we hear otherwise, I’m going to tag and build the RC starting 
Saturday EOD (PT). Please be patient since I’m going to be new at this :) but 
will keep the dev@ posted for any update.

Yours
RM for 2.2.1




--
Twitter: https://twitter.com/holdenkarau


Re: Cutting the RC for Spark 2.2.1 release

2017-11-13 Thread Holden Karau
Which script is this from?

On Mon, Nov 13, 2017 at 10:37 AM Felix Cheung 
wrote:

> Build/test looks good but I’m hitting a new issue with sonatype when
> tagging
>
> "Host name 'repo1.maven.org' does not match the certificate subject
> provided by the peer (CN=repo.maven.apache.org, O="Sonatype, Inc",
> L=Fulton, ST=MD, C=US)"
>
> https://issues.sonatype.org/browse/MVNCENTRAL-1369
>
> Stay tuned.
>
> --
> *From:* Felix Cheung 
> *Sent:* Monday, November 13, 2017 12:00:41 AM
> *To:* dev@spark.apache.org
> *Subject:* Re: Cutting the RC for Spark 2.2.1 release
>
> Quick update:
>
> We merged 6 fixes Friday and 7 fixes today (thanks!), since some are
> hand-merged I’m waiting for clean builds from Jenkins and test passes. As
> of now it looks like we need to take one more fix for Scala 2.10.
>
> With any luck we should be tagging for build tomorrow morning (PT).
>
> There should not be any issue targetting 2.2.1 except for SPARK-22042. As
> it is not a regression and it seems it might take a while, we won’t be
> blocking the release.
>
> _
> From: Felix Cheung 
> Sent: Wednesday, November 8, 2017 3:57 PM
> Subject: Cutting the RC for Spark 2.2.1 release
> To: 
>
>
> Hi!
>
> As we are closing down on the few known issues I think we are ready to tag
> and cut the 2.2.1 release.
>
> If you are aware of any issue that you think should go into this release
> please feel free to ping me and mark the JIRA as targeting 2.2.1. I will be
> scrubbing JIRA in the next few days.
>
> So unless we hear otherwise, I’m going to tag and build the RC starting
> Saturday EOD (PT). Please be patient since I’m going to be new at this :)
> but will keep the dev@ posted for any update.
>
> Yours
> RM for 2.2.1
>
>
>
>
> --
Twitter: https://twitter.com/holdenkarau


Re: Cutting the RC for Spark 2.2.1 release

2017-11-13 Thread Felix Cheung
Ouch ;) yes that works and RC1 is tagged.



From: Sean Owen 
Sent: Monday, November 13, 2017 10:54:48 AM
To: Felix Cheung
Cc: Holden Karau; dev@spark.apache.org
Subject: Re: Cutting the RC for Spark 2.2.1 release

It's repo.maven.apache.org ?

On Mon, Nov 13, 2017 at 12:52 PM Felix Cheung 
> wrote:
I did change it, but getting unknown host?

[ERROR] Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.11:2.2.1-SNAPSHOT: Could not transfer artifact 
org.apache:apache:pom:14 from/to central (https://repo.maven.org/maven2): 
repo.maven.org: Name or service not known and 
'parent.relativePath' points at wrong local POM @ line 22, column 11: Unknown 
host repo.maven.org: Name or service not known -> [Help 
2]




Re: Cutting the RC for Spark 2.2.1 release

2017-11-13 Thread Felix Cheung
Build/test looks good but I’m hitting a new issue with sonatype when tagging

"Host name 'repo1.maven.org' does not match the certificate subject provided by 
the peer (CN=repo.maven.apache.org, O="Sonatype, Inc", L=Fulton, ST=MD, C=US)"

https://issues.sonatype.org/browse/MVNCENTRAL-1369

Stay tuned.


From: Felix Cheung 
Sent: Monday, November 13, 2017 12:00:41 AM
To: dev@spark.apache.org
Subject: Re: Cutting the RC for Spark 2.2.1 release

Quick update:

We merged 6 fixes Friday and 7 fixes today (thanks!), since some are 
hand-merged I’m waiting for clean builds from Jenkins and test passes. As of 
now it looks like we need to take one more fix for Scala 2.10.

With any luck we should be tagging for build tomorrow morning (PT).

There should not be any issue targetting 2.2.1 except for SPARK-22042. As it is 
not a regression and it seems it might take a while, we won’t be blocking the 
release.

_
From: Felix Cheung >
Sent: Wednesday, November 8, 2017 3:57 PM
Subject: Cutting the RC for Spark 2.2.1 release
To: >


Hi!

As we are closing down on the few known issues I think we are ready to tag and 
cut the 2.2.1 release.

If you are aware of any issue that you think should go into this release please 
feel free to ping me and mark the JIRA as targeting 2.2.1. I will be scrubbing 
JIRA in the next few days.

So unless we hear otherwise, I’m going to tag and build the RC starting 
Saturday EOD (PT). Please be patient since I’m going to be new at this :) but 
will keep the dev@ posted for any update.

Yours
RM for 2.2.1






Re: Cutting the RC for Spark 2.2.1 release

2017-11-13 Thread Felix Cheung
I did change it, but getting unknown host?

[ERROR] Non-resolvable parent POM for 
org.apache.spark:spark-parent_2.11:2.2.1-SNAPSHOT: Could not transfer artifact 
org.apache:apache:pom:14 from/to central (https://repo.maven.org/maven2): 
repo.maven.org: Name or service not known and 
'parent.relativePath' points at wrong local POM @ line 22, column 11: Unknown 
host repo.maven.org: Name or service not known -> [Help 
2]

_
From: Sean Owen >
Sent: Monday, November 13, 2017 10:48 AM
Subject: Re: Cutting the RC for Spark 2.2.1 release
To: Felix Cheung >
Cc: Holden Karau >, 
>


I'm not seeing a problem building, myself. However we could change the location 
of the Maven Repository in our POM to https://repo.maven.apache.org/maven2/ 
without any consequence.

The only reason we overrode it was to force it to use HTTPS which still doesn't 
look like the default (!): 
https://maven.apache.org/guides/introduction/introduction-to-the-pom.html#Super_POM

On a related note, we could also update the POM to inherit from the latest 
Apache parent POM, while we're at it, to get the latest declarations relevant 
to the ASF. Doesn't need to happen in 2.2.x

On Mon, Nov 13, 2017 at 12:39 PM Felix Cheung 
> wrote:
Anything to build with maven on a clean machine.
It couldn’t connect to maven central repo.





Re: Cutting the RC for Spark 2.2.1 release

2017-11-13 Thread Sean Owen
I'm not seeing a problem building, myself. However we could change the
location of the Maven Repository in our POM to
https://repo.maven.apache.org/maven2/ without any consequence.

The only reason we overrode it was to force it to use HTTPS which still
doesn't look like the default (!):
https://maven.apache.org/guides/introduction/introduction-to-the-pom.html#Super_POM

On a related note, we could also update the POM to inherit from the latest
Apache parent POM, while we're at it, to get the latest declarations
relevant to the ASF. Doesn't need to happen in 2.2.x

On Mon, Nov 13, 2017 at 12:39 PM Felix Cheung 
wrote:

> Anything to build with maven on a clean machine.
> It couldn’t connect to maven central repo.
>
>


Re: Cutting the RC for Spark 2.2.1 release

2017-11-13 Thread Sean Owen
It's repo.maven.apache.org ?

On Mon, Nov 13, 2017 at 12:52 PM Felix Cheung 
wrote:

> I did change it, but getting unknown host?
>
> [ERROR] Non-resolvable parent POM for
> org.apache.spark:spark-parent_2.11:2.2.1-SNAPSHOT: Could not transfer
> artifact org.apache:apache:pom:14 from/to central (
> https://repo.maven.org/maven2): repo.maven.org: Name or service not known
> and 'parent.relativePath' points at wrong local POM @ line 22, column 11:
> Unknown host repo.maven.org: Name or service not known -> [Help 2]
>
>
>


Re: Reload some static data during struct streaming

2017-11-13 Thread Burak Yavuz
I think if you don't cache the jdbc table, then it should auto-refresh.

On Mon, Nov 13, 2017 at 1:21 PM, spark receiver 
wrote:

> Hi
>
> I’m using struct streaming(spark 2.2)  to receive Kafka msg ,it works
> great. The thing is I need to join the Kafka message with a relative static
> table stored in mysql database (let’s call it metadata here).
>
> So is it possible to reload the metadata table after some time
> interval(like daily ) without restart running struct streaming?
>
> Snippet code as following :
>
> // df_meta contains important information to join with the dataframe read 
> from kafka
>
> val df_meta = spark.read.format("jdbc").option("url", mysql_url).option(
> "dbtable", "v_entity_ap_rel").load()
>
> df_meta.cache()
>
> val df = spark.readStream
>   .format("kafka")
>   .option("kafka.bootstrap.servers", 
> *“*x.x.x.x:9092").option("fetch.message.max.bytes", 
> "5000").option("kafka.max.partition.fetch.bytes", "5000")
>   .option("subscribe", "rawdb.raw_data")
>   .option("failOnDataLoss", true)
>   .option("startingOffsets", "latest")
>   .load()
>   .select($"value".as[Array[Byte]])
>   .map(avroDeserialize(_))
>   .as[ApRawData].select("APMAC", "RSSI", "sourceMacAddress", "updatingTime")
>   .join(df_meta.as("b"), $"a.apmac" === $"b.apmac*”*)
>
>
> df.selectExpr("ENTITYID", "CLIENTMAC", "STIME", "case when a.rrssi>=b.rssi 
> then '1' when a.rrssi < b.nearbyrssi then '3' else '2' end FLAG", 
> "substring(stime,1,13) STIME_HOUR")
>   .distinct().writeStream.format("parquet").partitionBy("STIME_HOUR")
>   .option("checkpointLocation", 
> "/user/root/t_cf_table_chpt").trigger(ProcessingTime("5 minutes"))
>   .start("T_CF_TABLE")
>   .awaitTermination()
>
>
> Mason
>


Reload some static data during struct streaming

2017-11-13 Thread spark receiver
Hi 

I’m using struct streaming(spark 2.2)  to receive Kafka msg ,it works great. 
The thing is I need to join the Kafka message with a relative static table 
stored in mysql database (let’s call it metadata here).

So is it possible to reload the metadata table after some time interval(like 
daily ) without restart running struct streaming?

Snippet code as following :
// df_meta contains important information to join with the dataframe read from 
kafka
val df_meta = spark.read.format("jdbc").option("url", 
mysql_url).option("dbtable", "v_entity_ap_rel").load()
df_meta.cache()
val df = spark.readStream
  .format("kafka")
  .option("kafka.bootstrap.servers", 
“x.x.x.x:9092").option("fetch.message.max.bytes", 
"5000").option("kafka.max.partition.fetch.bytes", "5000")
  .option("subscribe", "rawdb.raw_data")
  .option("failOnDataLoss", true)
  .option("startingOffsets", "latest")
  .load()
  .select($"value".as[Array[Byte]])
  .map(avroDeserialize(_))
  .as[ApRawData].select("APMAC", "RSSI", "sourceMacAddress", "updatingTime")
  .join(df_meta.as("b"), $"a.apmac" === $"b.apmac”)

df.selectExpr("ENTITYID", "CLIENTMAC", "STIME", "case when a.rrssi>=b.rssi then 
'1' when a.rrssi < b.nearbyrssi then '3' else '2' end FLAG", 
"substring(stime,1,13) STIME_HOUR")
  .distinct().writeStream.format("parquet").partitionBy("STIME_HOUR")
  .option("checkpointLocation", 
"/user/root/t_cf_table_chpt").trigger(ProcessingTime("5 minutes"))
  .start("T_CF_TABLE")
  .awaitTermination()

Mason

Re: Reload some static data during struct streaming

2017-11-13 Thread spark receiver
I need it cached to improve throughput ,only hope it can be refreshed once a 
day not every batch.


> On Nov 13, 2017, at 4:49 PM, Burak Yavuz  wrote:
> 
> I think if you don't cache the jdbc table, then it should auto-refresh.
> 
> On Mon, Nov 13, 2017 at 1:21 PM, spark receiver  > wrote:
> Hi 
> 
> I’m using struct streaming(spark 2.2)  to receive Kafka msg ,it works great. 
> The thing is I need to join the Kafka message with a relative static table 
> stored in mysql database (let’s call it metadata here).
> 
> So is it possible to reload the metadata table after some time interval(like 
> daily ) without restart running struct streaming?
> 
> Snippet code as following :
> // df_meta contains important information to join with the dataframe read 
> from kafka
> val df_meta = spark.read.format("jdbc").option("url", 
> mysql_url).option("dbtable", "v_entity_ap_rel").load()
> df_meta.cache()
> val df = spark.readStream
>   .format("kafka")
>   .option("kafka.bootstrap.servers", 
> “x.x.x.x:9092").option("fetch.message.max.bytes", 
> "5000").option("kafka.max.partition.fetch.bytes", "5000")
>   .option("subscribe", "rawdb.raw_data")
>   .option("failOnDataLoss", true)
>   .option("startingOffsets", "latest")
>   .load()
>   .select($"value".as[Array[Byte]])
>   .map(avroDeserialize(_))
>   .as[ApRawData].select("APMAC", "RSSI", "sourceMacAddress", "updatingTime")
>   .join(df_meta.as ("b"), $"a.apmac" === $"b.apmac”)
> 
> df.selectExpr("ENTITYID", "CLIENTMAC", "STIME", "case when a.rrssi>=b.rssi 
> then '1' when a.rrssi < b.nearbyrssi then '3' else '2' end FLAG", 
> "substring(stime,1,13) STIME_HOUR")
>   .distinct().writeStream.format("parquet").partitionBy("STIME_HOUR")
>   .option("checkpointLocation", 
> "/user/root/t_cf_table_chpt").trigger(ProcessingTime("5 minutes"))
>   .start("T_CF_TABLE")
>   .awaitTermination()
> 
> Mason
>