spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

2016-01-26 Thread james.gre...@baesystems.com
Hi

I posted this on the user list yesterday,  I am posting it here now because on 
further investigation I am pretty sure this is a bug:


On upgrade from 1.5.0 to 1.6.0 I have a problem with the hivethriftserver2, I 
have this code:

val hiveContext = new HiveContext(SparkContext.getOrCreate(conf));

val thing = 
hiveContext.read.parquet("hdfs://dkclusterm1.imp.net:8020/user/jegreen1/ex208")

thing.registerTempTable("thing")

HiveThriftServer2.startWithContext(hiveContext)


When I start things up on the cluster my hive-site.xml is found – I can see 
that the metastore connects:


INFO  metastore - Trying to connect to metastore with URI 
thrift://dkclusterm2.imp.net:9083
INFO  metastore - Connected to metastore.


But then later on the thrift server seems not to connect to the remote hive 
metastore but to start a derby instance instead:

INFO  AbstractService - Service:CLIService is started.
INFO  ObjectStore - ObjectStore, initialize called
INFO  Query - Reading in results for query 
"org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is 
closing
INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
INFO  ObjectStore - Initialized ObjectStore
INFO  HiveMetaStore - 0: get_databases: default
INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=get_databases: 
default
INFO  HiveMetaStore - 0: Shutting down the object store...
INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=Shutting down the 
object store...
INFO  HiveMetaStore - 0: Metastore shutdown complete.
INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=Metastore shutdown 
complete.
INFO  AbstractService - Service:ThriftBinaryCLIService is started.
INFO  AbstractService - Service:HiveServer2 is started.

On 1.5.0 the same bit of the log reads:

INFO  AbstractService - Service:CLIService is started.
INFO  metastore - Trying to connect to metastore with URI 
thrift://dkclusterm2.imp.net:9083  *** ie 1.5.0 connects to remote hive
INFO  metastore - Connected to metastore.
INFO  AbstractService - Service:ThriftBinaryCLIService is started.
INFO  AbstractService - Service:HiveServer2 is started.
INFO  ThriftCLIService - Starting ThriftBinaryCLIService on port 1 with 
5...500 worker threads



So if I connect to this with JDBC I can see all the tables on the hive server – 
but not anything temporary – I guess they are going to derby.

I see someone on the databricks website is also having this problem.


Thanks

James
Please consider the environment before printing this email. This message should 
be regarded as confidential. If you have received this email in error please 
notify the sender and destroy it immediately. Statements of intent shall only 
become binding when confirmed in hard copy by an authorised signatory. The 
contents of this email may relate to dealings with other companies under the 
control of BAE Systems Applied Intelligence Limited, details of which can be 
found at http://www.baesystems.com/Businesses/index.htm.


Re: Spark LDA model reuse with new set of data

2016-01-26 Thread doruchiulan
Yes, just saw myself the same thing last night. Thanks



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Spark-LDA-model-reuse-with-new-set-of-data-tp16099p16103.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark 2.0.0 release plan

2016-01-26 Thread Koert Kuipers
thanks thats all i needed

On Tue, Jan 26, 2016 at 6:19 PM, Sean Owen  wrote:

> I think it will come significantly later -- or else we'd be at code
> freeze for 2.x in a few days. I haven't heard anyone discuss this
> officially but had batted around May or so instead informally in
> conversation. Does anyone have a particularly strong opinion on that?
> That's basically an extra 3 month period.
>
> https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage
>
> On Tue, Jan 26, 2016 at 10:00 PM, Koert Kuipers  wrote:
> > Is the idea that spark 2.0 comes out roughly 3 months after 1.6? So
> > quarterly release as usual?
> > Thanks
>


Re: Unable to compile and test Spark in IntelliJ

2016-01-26 Thread Iulian Dragoș
On Tue, Jan 19, 2016 at 6:06 AM, Hyukjin Kwon  wrote:

> Hi all,
>
> I usually have been working with Spark in IntelliJ.
>
> Before this PR,
> https://github.com/apache/spark/commit/7cd7f2202547224593517b392f56e49e4c94cabc
>  for
> `[SPARK-12575][SQL] Grammar parity with existing SQL parser`. I was able to
> just open the project and then run some tests with IntelliJ Run button.
>
> However, it looks that PR adds some ANTLR files for parsing and I cannot
> run the tests as I did. So, I ended up with doing this by mvn compile first
> and then running some tests with IntelliJ.
>
> I can still run some tests with sbt or maven in comment line but this is a
> bit inconvenient. I just want to run some tests as I did in IntelliJ.
>
> I followed this
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
> several times but it still emits some exceptions such as
>
> Error:(779, 34) not found: value SparkSqlParser
> case ast if ast.tokenType == SparkSqlParser.TinyintLiteral =>
>  ^
>
> and I still should run mvn compile or mvn test first for them.
>
> Is there any good way to run some Spark tests within IntelliJ as I did
> before?
>

I'm using Eclipse, but all I had to do in order to build in the IDE was to
add `target/generated-sources/antlr3` to the project sources, after
building once in Sbt. You probably have the sources there already.

iulian


>
> Thanks!
>



-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

2016-01-26 Thread Yin Huai
Can you post more logs, specially lines around "Initializing execution hive
..." (this is for an internal used fake metastore and it is derby) and
"Initializing HiveMetastoreConnection version ..." (this is for the real
metastore. It should be your remote one)? Also, those temp tables are
stored in the memory and are associated with a HiveContext. If you can not
see temp tables, it usually means that the HiveContext that you used with
JDBC was different from the one used to create the temp table. However, in
your case, you are using HiveThriftServer2.startWithContext(hiveContext).
So, it will be good to provide more logs and see what happened.

Thanks,

Yin

On Tue, Jan 26, 2016 at 1:33 AM, james.gre...@baesystems.com <
james.gre...@baesystems.com> wrote:

> Hi
>
> I posted this on the user list yesterday,  I am posting it here now
> because on further investigation I am pretty sure this is a bug:
>
>
> On upgrade from 1.5.0 to 1.6.0 I have a problem with the
> hivethriftserver2, I have this code:
>
> val hiveContext = new HiveContext(SparkContext.getOrCreate(conf));
>
> val thing = hiveContext.read.parquet("hdfs://
> dkclusterm1.imp.net:8020/user/jegreen1/ex208")
>
> thing.registerTempTable("thing")
>
> HiveThriftServer2.startWithContext(hiveContext)
>
>
> When I start things up on the cluster my hive-site.xml is found – I can
> see that the metastore connects:
>
>
> INFO  metastore - Trying to connect to metastore with URI thrift://
> dkclusterm2.imp.net:9083
> INFO  metastore - Connected to metastore.
>
>
> But then later on the thrift server seems not to connect to the remote
> hive metastore but to start a derby instance instead:
>
> INFO  AbstractService - Service:CLIService is started.
> INFO  ObjectStore - ObjectStore, initialize called
> INFO  Query - Reading in results for query
> "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used
> is closing
> INFO  MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
> INFO  ObjectStore - Initialized ObjectStore
> INFO  HiveMetaStore - 0: get_databases: default
> INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=get_databases:
> default
> INFO  HiveMetaStore - 0: Shutting down the object store...
> INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=Shutting down
> the object store...
> INFO  HiveMetaStore - 0: Metastore shutdown complete.
> INFO  audit - ugi=jegreen1  ip=unknown-ip-addr  cmd=Metastore
> shutdown complete.
> INFO  AbstractService - Service:ThriftBinaryCLIService is started.
> INFO  AbstractService - Service:HiveServer2 is started.
>
> On 1.5.0 the same bit of the log reads:
>
> INFO  AbstractService - Service:CLIService is started.
> INFO  metastore - Trying to connect to metastore with URI thrift://
> dkclusterm2.imp.net:9083  *** ie 1.5.0 connects to remote hive
> INFO  metastore - Connected to metastore.
> INFO  AbstractService - Service:ThriftBinaryCLIService is started.
> INFO  AbstractService - Service:HiveServer2 is started.
> INFO  ThriftCLIService - Starting ThriftBinaryCLIService on port 1
> with 5...500 worker threads
>
>
>
> So if I connect to this with JDBC I can see all the tables on the hive
> server – but not anything temporary – I guess they are going to derby.
>
> I see someone on the databricks website is also having this problem.
>
>
> Thanks
>
> James
> Please consider the environment before printing this email. This message
> should be regarded as confidential. If you have received this email in
> error please notify the sender and destroy it immediately. Statements of
> intent shall only become binding when confirmed in hard copy by an
> authorised signatory. The contents of this email may relate to dealings
> with other companies under the control of BAE Systems Applied Intelligence
> Limited, details of which can be found at
> http://www.baesystems.com/Businesses/index.htm.
>


Issue with spark-shell in yarn mode

2016-01-26 Thread ndjido
Hi folks,

On Spark 1.6.0, I submitted 2 lines of code via spark-shell in Yarn-client mode:

1) sc.parallelize(Array(1,2,3,3,3,3,4)).collect()

2) sc.parallelize(Array(1,2,3,3,3,3,4)).map( x => (x, 1)).collect()

1) works well whereas 2) raises the following exception: 

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1314)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.take(RDD.scala:1288)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:28)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:37)
at $iwC$$iwC$$iwC$$iwC.(:39)
at $iwC$$iwC$$iwC.(:41)
at $iwC$$iwC.(:43)
at $iwC.(:45)
at (:47)
at .(:51)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 

Re: Spark 2.0.0 release plan

2016-01-26 Thread Sean Owen
I think it will come significantly later -- or else we'd be at code
freeze for 2.x in a few days. I haven't heard anyone discuss this
officially but had batted around May or so instead informally in
conversation. Does anyone have a particularly strong opinion on that?
That's basically an extra 3 month period.

https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage

On Tue, Jan 26, 2016 at 10:00 PM, Koert Kuipers  wrote:
> Is the idea that spark 2.0 comes out roughly 3 months after 1.6? So
> quarterly release as usual?
> Thanks

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark LDA model reuse with new set of data

2016-01-26 Thread Joseph Bradley
Hi,

This is more a question for the user list, not the dev list, so I'll CC
user.

If you're using mllib.clustering.LDAModel (RDD API), then can you make sure
you're using a LocalLDAModel (or convert to it from DistributedLDAModel)?
You can then call topicDistributions() on the new data.

If you're using ml.clustering.LDAModel (DataFrame API), then you can call
transform() on new data.

Does that work?

Joseph

On Tue, Jan 19, 2016 at 6:21 AM, doruchiulan  wrote:

> Hi,
>
> Just so you know, I am new to Spark, and also very new to ML (this is my
> first contact with ML).
>
> Ok, I am trying to write a DSL where you can run some commands.
>
> I did a command that trains the Spark LDA and it produces the topics I want
> and I saved it using the save method provided by the LDAModel.
>
> Now I want to load this LDAModel and use it to predict on a new set of
> data.
> I call the load method, obtain the LDAModel instance but here I am stuck.
>
> Isnt this possible ? Am I wrong in the way I understood LDA and we cannot
> reuse trained LDA to analyse new data ?
>
> If its possible can you point me to some documentation, or give me a hint
> on
> how should I do that.
>
> Thx
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-LDA-model-reuse-with-new-set-of-data-tp16047.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


RE: Unable to compile and test Spark in IntelliJ

2016-01-26 Thread Mao, Wei
I used to meet same compile error within Intellij, and resolved by click:

View --> Tool Windows --> Maven Projects --> Spark Project Catalyst --> Plugins 
--> antlr3, then remake project

Thanks,
William Mao

From: Iulian Dragoș [mailto:iulian.dra...@typesafe.com]
Sent: Wednesday, January 27, 2016 12:12 AM
To: Hyukjin Kwon
Cc: dev@spark.apache.org
Subject: Re: Unable to compile and test Spark in IntelliJ



On Tue, Jan 19, 2016 at 6:06 AM, Hyukjin Kwon 
> wrote:
Hi all,

I usually have been working with Spark in IntelliJ.
Before this PR, 
https://github.com/apache/spark/commit/7cd7f2202547224593517b392f56e49e4c94cabc 
for `[SPARK-12575][SQL] Grammar parity with existing SQL parser`. I was able to 
just open the project and then run some tests with IntelliJ Run button.

However, it looks that PR adds some ANTLR files for parsing and I cannot run 
the tests as I did. So, I ended up with doing this by mvn compile first and 
then running some tests with IntelliJ.

I can still run some tests with sbt or maven in comment line but this is a bit 
inconvenient. I just want to run some tests as I did in IntelliJ.

I followed this 
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools 
several times but it still emits some exceptions such as

Error:(779, 34) not found: value SparkSqlParser
case ast if ast.tokenType == SparkSqlParser.TinyintLiteral =>
 ^

and I still should run mvn compile or mvn test first for them.

Is there any good way to run some Spark tests within IntelliJ as I did before?

I'm using Eclipse, but all I had to do in order to build in the IDE was to add 
`target/generated-sources/antlr3` to the project sources, after building once 
in Sbt. You probably have the sources there already.

iulian


Thanks!



--

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com



RE: Unable to compile and test Spark in IntelliJ

2016-01-26 Thread Hyukjin Kwon
Thanks guys! That works good.
On 27 Jan 2016 12:14, "Mao, Wei"  wrote:

> I used to meet same compile error within Intellij, and resolved by click:
>
>
>
> View à Tool Windows à Maven Projects à Spark Project Catalyst à Plugins à
> antlr3, then remake project
>
>
>
> Thanks,
>
> William Mao
>
>
>
> *From:* Iulian Dragoș [mailto:iulian.dra...@typesafe.com]
> *Sent:* Wednesday, January 27, 2016 12:12 AM
> *To:* Hyukjin Kwon
> *Cc:* dev@spark.apache.org
> *Subject:* Re: Unable to compile and test Spark in IntelliJ
>
>
>
>
>
>
>
> On Tue, Jan 19, 2016 at 6:06 AM, Hyukjin Kwon  wrote:
>
> Hi all,
>
> I usually have been working with Spark in IntelliJ.
>
> Before this PR,
> https://github.com/apache/spark/commit/7cd7f2202547224593517b392f56e49e4c94cabc
>  for
> `[SPARK-12575][SQL] Grammar parity with existing SQL parser`. I was able to
> just open the project and then run some tests with IntelliJ Run button.
>
>
> However, it looks that PR adds some ANTLR files for parsing and I cannot
> run the tests as I did. So, I ended up with doing this by mvn compile first
> and then running some tests with IntelliJ.
>
>
> I can still run some tests with sbt or maven in comment line but this is a
> bit inconvenient. I just want to run some tests as I did in IntelliJ.
>
> I followed this
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
> several times but it still emits some exceptions such as
>
>
>
> Error:(779, 34) not found: value SparkSqlParser
>
> case ast if ast.tokenType == SparkSqlParser.TinyintLiteral =>
>
>  ^
>
>
>
> and I still should run mvn compile or mvn test first for them.
>
> Is there any good way to run some Spark tests within IntelliJ as I did
> before?
>
>
>
> I'm using Eclipse, but all I had to do in order to build in the IDE was to
> add `target/generated-sources/antlr3` to the project sources, after
> building once in Sbt. You probably have the sources there already.
>
>
>
> iulian
>
>
>
>
> Thanks!
>
>
>
>
>
> --
>
>
> --
> Iulian Dragos
>
>
>
> --
> Reactive Apps on the JVM
> www.typesafe.com
>
>
>