>
> But it's really weird to be setting SPARK_HOME in the environment of
> your node managers. YARN shouldn't need to know about that.
> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
> wrote:
> >
> >
> https://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4
so it does not get
> expanded by the shell).
>
> But it's really weird to be setting SPARK_HOME in the environment of
> your node managers. YARN shouldn't need to know about that.
> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
> wrote:
> >
> >
> https://github.c
d to be setting SPARK_HOME in the environment of
>> your node managers. YARN shouldn't need to know about that.
>> On Thu, Oct 4, 2018 at 10:22 AM Jianshi Huang
>> wrote:
>> >
>> >
>> https://github.com/apache/spark/blob/88e7e87bd5c052e10f52d4bb97a9d
d from your gateway machine to YARN by
> default.
>
> You probably have some configuration (in spark-defaults.conf) that
> tells YARN to use a cached copy. Get rid of that configuration, and
> you can use whatever version you like.
> On Thu, Oct 4, 2018 at 2:19 AM Jianshi Huang
> wro
27;, 'FAIR')
> ,('spark.shuffle.service.enabled', 'true')
> ,('spark.dynamicAllocation.enabled', 'true')
> ])
> py_files =
> ['hdfs://emr-header-1.cluster-68492:9000/lib/py4j-0.10.7-src.zip']
> sc = pyspark.SparkContext(appName="Jianshi", master="yarn-client",
> conf=sparkConf, pyFiles=py_files)
>
>
Thanks,
--
Jianshi Huang
- Are all files readable by the user running the history server?
> - Did all applications call sc.stop() correctly (i.e. files do not have
> the ".inprogress" suffix)?
>
> Other than that, always look at the logs first, looking for any errors
> that may be thrown.
>
>
&
BTW, is there an option to set file permission for spark event logs?
Jianshi
On Thu, May 28, 2015 at 11:25 AM, Jianshi Huang
wrote:
> Hmm...all files under the event log folder has permission 770 but
> strangely my account cannot read other user's files. Permission denied.
>
>
gt;
> On Wed, May 27, 2015 at 5:33 AM, Jianshi Huang
> wrote:
>
>> No one using History server? :)
>>
>> Am I the only one need to see all user's logs?
>>
>> Jianshi
>>
>> On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang
>> wrote:
>&
No one using History server? :)
Am I the only one need to see all user's logs?
Jianshi
On Thu, May 21, 2015 at 1:29 PM, Jianshi Huang
wrote:
> Hi,
>
> I'm using Spark 1.4.0-rc1 and I'm using default settings for history
> server.
>
> But I can only see my own
Hi,
I'm using Spark 1.4.0-rc1 and I'm using default settings for history server.
But I can only see my own logs. Is it possible to view all user's logs? The
permission is fine for the user group.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
t;= 2014-04-30))
PhysicalRDD [meta#143,nvar#145,date#147], MapPartitionsRDD[6] at
explain at :32
Jianshi
On Tue, May 12, 2015 at 10:34 PM, Olivier Girardot
wrote:
> can you post the explain too ?
>
> Le mar. 12 mai 2015 à 12:11, Jianshi Huang a
> écrit :
>
>> Hi,
s like https://issues.apache.org/jira/browse/SPARK-5446 is still open,
when can we have it fixed? :)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
I'm using the default settings.
Jianshi
On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva wrote:
> Hi,
>
> Can you please share your compression etc settings, which you are using.
>
> Thanks,
> Twinkle
>
> On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang
> wrot
I'm facing this error in Spark 1.3.1
https://issues.apache.org/jira/browse/SPARK-4105
Anyone knows what's the workaround? Change the compression codec for
shuffle output?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
t;> Fix Version of SPARK-4520 is not set.
>> I assume it was fixed in 1.3.0
>>
>> Cheers
>> Fix Version
>>
>> On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai wrote:
>>
>>> The exception looks like the one mentioned in
>>> https://is
; Fix Version
>>
>> On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai wrote:
>>
>>> The exception looks like the one mentioned in
>>> https://issues.apache.org/jira/browse/SPARK-4520. What is the version
>>> of Spark?
at
parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:126)
at
parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:193)
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Oh, I found it out. Need to import sql.functions._
Then I can do
table.select(lit("2015-04-22").as("date"))
Jianshi
On Wed, Apr 22, 2015 at 7:27 PM, Jianshi Huang
wrote:
> Hi,
>
> I want to do this in Spark SQL DSL:
>
> select '2015-04-22
Hi,
I want to do this in Spark SQL DSL:
select '2015-04-22' as date
from table
How to do this?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Hi,
I want to write this in Spark SQL DSL:
select map('c1', c1, 'c2', c2) as m
from table
Is there a way?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
m lime / the big picture – in some models,
> friction can be a huge factor in the equations in some other it is just
> part of the landscape
>
>
>
> *From:* Gerard Maas [mailto:gerard.m...@gmail.com]
> *Sent:* Friday, April 17, 2015 10:12 AM
>
> *To:* Evo Eftimov
> *Cc:* Tath
ne DStream
-> multiple DStreams)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Hi,
Anyone has similar request?
https://issues.apache.org/jira/browse/SPARK-6561
When we save a DataFrame into Parquet files, we also want to have it
partitioned.
The proposed API looks like this:
def saveAsParquet(path: String, partitionColumns: Seq[String])
--
Jianshi Huang
LinkedIn
Oh, by default it's set to 0L.
I'll try setting it to 3 immediately. Thanks for the help!
Jianshi
On Mon, Mar 16, 2015 at 11:32 PM, Jianshi Huang
wrote:
> Thanks Shixiong!
>
> Very strange that our tasks were retried on the same executor again and
of our cases are the second one, we set
> "spark.scheduler.executorTaskBlacklistTime" to 3 to solve such "No
> space left on device" errors. So if a task runs unsuccessfully in some
> executor, it won't be scheduled to the same executor in 30 seconds.
>
>
> Best Regards,
> Shi
I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353
On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang
wrote:
> Hi,
>
> We're facing "No space left on device" errors lately from time to time.
> The job will fail after retries. Obvious in such case, retry w
he problematic datanode before retrying it.
And maybe dynamically allocate another datanode if dynamic allocation is
enabled.
I think there needs to be a class of fatal errors that can't be recovered
with retries. And it's best Spark can handle it nicely.
Thanks,
--
Jianshi Huang
LinkedIn:
Liancheng also found out that the Spark jars are not included in the
classpath of URLClassLoader.
Hmm... we're very close to the truth now.
Jianshi
On Fri, Mar 13, 2015 at 6:03 PM, Jianshi Huang
wrote:
> I'm almost certain the problem is the ClassLoader.
>
> So adding
I'm almost certain the problem is the ClassLoader.
So adding
fork := true
solves problems for test and run.
The problem is how can I fork a JVM for sbt console? fork in console :=
true seems not working...
Jianshi
On Fri, Mar 13, 2015 at 4:35 PM, Jianshi Huang
wrote:
> I gues
nction is throwing exception
>>>
>>> Exception in thread "main" scala.ScalaReflectionException: class
>>> org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with primordial
>>> classloader with boot classpath [.] not found
>>>
>>
Forget about my last message. I was confused. Spark 1.2.1 + Scala 2.10.4
started by SBT console command also failed with this error. However running
from a standard spark shell works.
Jianshi
On Fri, Mar 13, 2015 at 2:46 PM, Jianshi Huang
wrote:
> Hmm... look like the console command st
Hmm... look like the console command still starts a Spark 1.3.0 with Scala
2.11.6 even I changed them in build.sbt.
So the test with 1.2.1 is not valid.
Jianshi
On Fri, Mar 13, 2015 at 2:34 PM, Jianshi Huang
wrote:
> I've confirmed it only failed in console started by SBT.
>
>
@transient val sqlc = new org.apache.spark.sql.SQLContext(sc)
[info] implicit def sqlContext = sqlc
[info] import sqlc._
Jianshi
On Fri, Mar 13, 2015 at 3:10 AM, Jianshi Huang
wrote:
> BTW, I was running tests from SBT when I get the errors. One test turn a
> Seq of case class to Data
:23 AM, Jianshi Huang
wrote:
> Same issue here. But the classloader in my exception is somehow different.
>
> scala.ScalaReflectionException: class
> org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with
> java.net.URLClassLoader@53298398 of type class java.net.URLCla
th boot classpath [.] not found
>>>
>>>
>>> Here's more info on the versions I am using -
>>>
>>> 2.11
>>> 1.2.1
>>> 2.11.5
>>>
>>> Please let me know how can I resolve this problem.
>>>
>>> Thanks
>>> Ashish
>>>
>>
>>
>
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
user home
> directories either. Typically, like in YARN, you would a number of
> directories (on different disks) mounted and configured for local
> storage for jobs.
>
> On Wed, Mar 11, 2015 at 7:42 AM, Jianshi Huang
> wrote:
> > Unfortunately /tmp mount is really small in ou
n't support expressions or wildcards in that configuration. For
> each application, the local directories need to be constant. If you
> have users submitting different Spark applications, those can each set
> spark.local.dirs.
>
> - Patrick
>
> On Wed, Mar 11, 2015 at 12:14 AM, J
Hi,
I need to set per-user spark.local.dir, how can I do that?
I tried both
/x/home/${user.name}/spark/tmp
and
/x/home/${USER}/spark/tmp
And neither worked. Looks like it has to be a constant setting in
spark-defaults.conf. Right?
Any ideas how to do that?
Thanks,
--
Jianshi Huang
ar 5, 2015 at 4:01 PM, Shao, Saisai wrote:
> I think there’s a lot of JIRA trying to solve this problem (
> https://issues.apache.org/jira/browse/SPARK-5763). Basically sort merge
> join is a good choice.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Jianshi Hua
48 PM, Jianshi Huang
wrote:
> I see. I'm using core's join. The data might have some skewness
> (checking).
>
> I understand shuffle can spill data to disk but when consuming it, say in
> cogroup or groupByKey, it still needs to read the whole group elements,
> right? I gues
park core side, all the shuffle related operations can spill the
> data into disk and no need to read the whole partition into memory. But if
> you uses SparkSQL, it depends on how SparkSQL uses this operators.
>
>
>
> CC @hao if he has some thoughts on it.
>
>
>
> Than
e issues when join key is skewed or key number is
> smaller, so you will meet OOM.
>
>
>
> Maybe you could monitor each stage or task’s shuffle and GC status also
> system status to identify the problem.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Jianshi
One really interesting is that when I'm using the
netty-based spark.shuffle.blockTransferService, there's no OOM error
messages (java.lang.OutOfMemoryError: Java heap space).
Any idea why it's not here?
I'm using Spark 1.2.1.
Jianshi
On Thu, Mar 5, 2015 at 1:56 PM, Jiansh
at 2:11 PM, Jianshi Huang
wrote:
> Hmm... ok, previous errors are still block fetch errors.
>
> 15/03/03 10:22:40 ERROR RetryingBlockFetcher: Exception while beginning
> fetch of 11 outstanding blocks
> java.io.IOException: Failed to connect to host-xxx
Davidson wrote:
> Drat! That doesn't help. Could you scan from the top to see if there were
> any fatal errors preceding these? Sometimes a OOM will cause this type of
> issue further down.
>
> On Tue, Mar 3, 2015 at 8:16 PM, Jianshi Huang
> wrote:
>
>> The failed
its logs as well.
>
> On Tue, Mar 3, 2015 at 11:03 AM, Jianshi Huang
> wrote:
>
>> Sorry that I forgot the subject.
>>
>> And in the driver, I got many FetchFailedException. The error messages are
>>
>> 15/03/03 10:34:32 WARN TaskSetManager: Lost task 31.0 in
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
Jianshi
On Wed, Mar 4, 2015 at 2:55 AM, Jianshi Huang
wrote:
> Hi,
>
> I got this error message:
>
&
SNAPSHOT I built around Dec. 20. Is there any
bug fixes related to shuffle block fetching or index files after that?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
serde?
Loading tables using parquetFile vs. loading tables from Hive metastore
with Parquet serde
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
: https://issues.apache.org/jira/browse/SPARK-5828
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
eynold Xin :
>
> I think we made the binary protocol compatible across all versions, so you
>> should be fine with using any one of them. 1.2.1 is probably the best since
>> it is the most recent stable release.
>>
>> On Tue, Feb 10, 2015 at 8:43 PM, Jianshi Huang
&
, 1.3.0)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
Hi,
Anyone has implemented the default Pig Loader in Spark? (loading delimited
text files with .pig_schema)
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
t;
>> val parameterInfo = new
>> SimpleGenericUDAFParameterInfo(inspectors.toArray, false, false)
>> resolver.getEvaluator(parameterInfo)
>>
>> FYI
>>
>> On Tue, Jan 13, 2015 at 1:51 PM, Jianshi Huang
>> wrote:
>>
>>> Hi,
>>
org.apache.spark.sql.catalyst.plans.logical.Aggregate$$anonfun$output$6.apply(basicOperators.scala:143)
I'm using latest branch-1.2
I found in PR that percentile and percentile_approx are supported. A bug?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
FYI,
Latest hive 0.14/parquet will have column renaming support.
Jianshi
On Wed, Dec 10, 2014 at 3:37 AM, Michael Armbrust
wrote:
> You might also try out the recently added support for views.
>
> On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang
> wrote:
>
>> Ah... I see. T
>
>
>
> On Sat, Dec 6, 2014 at 8:28 PM, Jianshi Huang
> wrote:
>
>> Ok, found another possible bug in Hive.
>>
>> My current solution is to use ALTER TABLE CHANGE to rename the column
>> names.
>>
>> The problem is after renaming the colum
ght? We can extract
> some useful functions from JsonRDD.scala, so others can access them.
>
> Thanks,
>
> Yin
>
> On Mon, Dec 8, 2014 at 1:29 AM, Jianshi Huang
> wrote:
>
>> I checked the source code for inferSchema. Looks like this is exactly
>> what I want:
&
I checked the source code for inferSchema. Looks like this is exactly what
I want:
val allKeys = rdd.map(allKeysWithValueTypes).reduce(_ ++ _)
Then I can do createSchema(allKeys).
Jianshi
On Sun, Dec 7, 2014 at 2:50 PM, Jianshi Huang
wrote:
> Hmm..
>
> I've created
Hmm..
I've created a JIRA: https://issues.apache.org/jira/browse/SPARK-4782
Jianshi
On Sun, Dec 7, 2014 at 2:32 PM, Jianshi Huang
wrote:
> Hi,
>
> What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
>
> I'm currently converting ea
Hi,
What's the best way to convert RDD[Map[String, Any]] to a SchemaRDD?
I'm currently converting each Map to a JSON String and do
JsonRDD.inferSchema.
How about adding inferSchema support to Map[String, Any] directly? It would
be very useful.
Thanks,
--
Jianshi Huang
LinkedI
a> sql("select cre_ts from pmt limit 1").collect
res16: Array[org.apache.spark.sql.Row] = Array([null])
I created a JIRA for it:
https://issues.apache.org/jira/browse/SPARK-4781
Jianshi
On Sun, Dec 7, 2014 at 1:06 AM, Jianshi Huang
wrote:
> Hmm... another issue I found
Hmm... another issue I found doing this approach is that ANALYZE TABLE ...
COMPUTE STATISTICS will fail to attach the metadata to the table, and later
broadcast join and such will fail...
Any idea how to fix this issue?
Jianshi
On Sat, Dec 6, 2014 at 9:10 PM, Jianshi Huang
wrote:
> V
Very interesting, the line doing drop table will throws an exception. After
removing it all works.
Jianshi
On Sat, Dec 6, 2014 at 9:11 AM, Jianshi Huang
wrote:
> Here's the solution I got after talking with Liancheng:
>
> 1) using backquote `..` to wrap up all illegal characte
o drop and register table
val t = table(name)
val newSchema = StructType(t.schema.fields.map(s => s.copy(name =
s.name.replaceAll(".*?::", ""
sql(s"drop table $name")
applySchema(t, newSchema).registerTempTable(name)
I'm testing it for now.
Tha
e external table pmt (
sorted::id bigint
)
stored as parquet
location '...'
Obviously it didn't work, I also tried removing the identifier sorted::,
but the resulting rows contain only nulls.
Any idea how to create a table in HiveContext from these Parquet files?
Thanks,
xception in the logs, but that exception does not propogate to user code.
>>
>> On Thu, Dec 4, 2014 at 11:31 PM, Jianshi Huang
>> wrote:
>>
>> > Hi,
>> >
>> > I got exception saying Hive: NoSuchObjectException(message: table
>> > not found)
With Liancheng's suggestion, I've tried setting
spark.sql.hive.convertMetastoreParquet false
but still analyze noscan return -1 in rawDataSize
Jianshi
On Fri, Dec 5, 2014 at 3:33 PM, Jianshi Huang
wrote:
> If I run ANALYZE without NOSCAN, then Hive can successfully
30 PM, Jianshi Huang
wrote:
> Sorry for the late of follow-up.
>
> I used Hao's DESC EXTENDED command and found some clue:
>
> new (broadcast broken Spark build):
> parameters:{numFiles=0, EXTERNAL=TRUE, transient_lastDdlTime=1417763892,
> COLUMN_STATS_ACCURATE
Hi,
I got exception saying Hive: NoSuchObjectException(message: table
not found)
when running "DROP TABLE IF EXISTS "
Looks like a new regression in Hive module.
Anyone can confirm this?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
is will print the detail physical plan.
>
>
>
> Let me know if you still have problem.
>
>
>
> Hao
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
> *Sent:* Thursday, November 27, 2014 10:24 PM
> *To:* Cheng, Hao
> *Cc:* user
> *Subject:* Re: Auto B
I created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-4757
Jianshi
On Fri, Dec 5, 2014 at 1:31 PM, Jianshi Huang
wrote:
> Correction:
>
> According to Liancheng, this hotfix might be the root cause:
>
>
> https://github.com/a
Correction:
According to Liancheng, this hotfix might be the root cause:
https://github.com/apache/spark/commit/38cb2c3a36a5c9ead4494cbc3dde008c2f0698ce
Jianshi
On Fri, Dec 5, 2014 at 12:45 PM, Jianshi Huang
wrote:
> Looks like the datanucleus*.jar shouldn't appear in the hdfs
Looks like the datanucleus*.jar shouldn't appear in the hdfs path in
Yarn-client mode.
Maybe this patch broke yarn-client.
https://github.com/apache/spark/commit/a975dc32799bb8a14f9e1c76defaaa7cfbaf8b53
Jianshi
On Fri, Dec 5, 2014 at 12:02 PM, Jianshi Huang
wrote:
> Act
Actually my HADOOP_CLASSPATH has already been set to include
/etc/hadoop/conf/*
export
HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml:/usr/lib/hbase/lib/hbase-protocol.jar:$(hbase
classpath)
Jianshi
On Fri, Dec 5, 2014 at 11:54 AM, Jianshi Huang
wrote:
> Looks like somehow Spark failed
SPATH?
Jianshi
On Fri, Dec 5, 2014 at 11:37 AM, Jianshi Huang
wrote:
> I got the following error during Spark startup (Yarn-client mode):
>
> 14/12/04 19:33:58 INFO Client: Uploading resource
> file:/x/home/jianshuang/spark/spark-latest/lib/datanucleus-api-jdo-3.2.6.jar
> -&g
ter HEAD yesterday. Is this a bug?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
tps://github.com/apache/spark/pull/3270 should be
> another optimization for this.
>
>
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
> *Sent:* Wednesday, November 26, 2014 4:36 PM
> *To:* user
> *Subject:* Auto BroadcastJoin optimization failed in lates
se has met similar situation?
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
/usr/lib/hive/lib doesn’t show any of the parquet
jars, but ls /usr/lib/impala/lib shows the jar we’re looking for as
parquet-hive-1.0.jar
Is it removed from latest Spark?
Jianshi
On Wed, Nov 26, 2014 at 2:13 PM, Jianshi Huang
wrote:
> Hi,
>
> Looks like the latest SparkSQL with Hive 0
)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
Using the same DDL and Analyze script above.
Jianshi
On Sat, Oct 11, 2014 at 2:18 PM, Jianshi Huang
wrote:
> It works fine, thanks for the help Michael.
>
> Liancheng also told m
t; Hello Jianshi,
>
> The reason of that error is that we do not have a Spark SQL data type for
> Scala BigInt. You can use Decimal for your case.
>
> Thanks,
>
> Yin
>
> On Fri, Nov 21, 2014 at 5:11 AM, Jianshi Huang
> wrote:
>
>> Hi,
>>
>&
Hi,
I got an error during rdd.registerTempTable(...) saying scala.MatchError:
scala.BigInt
Looks like BigInt cannot be used in SchemaRDD, is that correct?
So what would you recommend to deal with it?
Thanks,
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog:
ption saying that SparkContext is not serializable,
which is totally irrelevant to txnSentTo
I heard in Scala 2.11, there will be much better support in REPL to solve
this issue. Is that true?
Could anyone explain why we're having this problem?
Thanks,
--
Jianshi Huang
LinkedIn: jians
hreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> [error] (streaming-kafka/*:update) sbt.ResolveException: unresolved
> dependency: org.apache.kafka#kafka_2.11;0.8.0: not found
> [error] (catalyst/*:update) sbt.ResolveException: unresolved dependency:
> org.scalama
Any notable issues for using Scala 2.11? Is it stable now?
Or can I use Scala 2.11 in my spark application and use Spark dist build
with 2.10 ?
I'm looking forward to migrate to 2.11 for some quasiquote features.
Couldn't make it run in 2.10...
Cheers,
--
Jianshi Huang
LinkedI
14, 2014 at 2:49 PM, Jianshi Huang
> wrote:
>
>> Ok, then we need another trick.
>>
>> let's have an *implicit lazy var connection/context* around our code.
>> And setup() will trigger the eval and initialization.
>>
>
> Due to lazy evaluation, I thin
wrap: scala.r
eflect.internal.MissingRequirementError: object scala.runtime in compiler
mirror not found. -> [Help 1]
Anyone knows what's the problem?
I'm building it on OSX. I didn't had this problem one month ago.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshu
> If you’re just relying on the side effect of setup() and cleanup() then
> I think this trick is OK and pretty cleaner.
>
> But if setup() returns, say, a DB connection, then the map(...) part and
> cleanup() can’t get the connection object.
>
> On 11/14/14 1:20 PM, Jianshi Huang w
w-to-translate-from-mapreduce-to-apache-spark/
>
> On 11/14/14 10:44 AM, Dai, Kevin wrote:
>
> HI, all
>
>
>
> Is there setup and cleanup function as in hadoop mapreduce in spark which
> does some initialization and cleanup work?
>
>
>
> Best Regards,
>
&g
n and cleanup work?
>
>
>
> Best Regards,
>
> Kevin.
>
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
needs to be
collect to driver, is there a way to avoid doing this?
Thanks
Jianshi
On Mon, Oct 27, 2014 at 4:57 PM, Jianshi Huang
wrote:
> Sure, let's still focus on the streaming simulation use case. It's a very
> useful problem to solve.
>
> If we're going to use th
n of spray + akka + spark are you
> using ?
>
> [error]org.scalamacros:quasiquotes _2.10, _2.10.3
> [trace] Stack trace suppressed: run last *:update for the full output.
> [error] (*:update) Conflicting cross-version suffixes in:
> org.scalamacros:quasiq
> uotes
>
>
gt;
> Can you try a Spray version built with 2.2.x along with Spark 1.1 and
> include the Akka dependencies in your project’s sbt file?
>
>
>
> Mohammed
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com]
> *Sent:* Tuesday, October 28, 2014 8:58 PM
&g
I'm using Spark built from HEAD, I think it uses modified Akka 2.3.4, right?
Jianshi
On Wed, Oct 29, 2014 at 5:53 AM, Mohammed Guller
wrote:
> Try a version built with Akka 2.2.x
>
>
>
> Mohammed
>
>
>
> *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.co
org.spark-project.akka
2.3.4-spark
it should solve problem. Makes sense? I'll give it a shot when I have time,
now probably I'll just not using Spray client...
Cheers,
Jianshi
On Tue, Oct 28, 2014 at 6:02 PM, Jianshi Huang
wrote:
> Hi,
>
> I got the following exc
e exception.
Anyone has idea what went wrong? Need help!
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/
I have never tried this yet, but maybe you can use an in-memory Derby
> database as metastore
> https://db.apache.org/derby/docs/10.7/devguide/cdevdvlpinmemdb.html
>
> I'll investigate this when free, guess we can use this for Spark SQL Hive
> support testing.
>
> On 10/27/14 4
Any suggestion? :)
Jianshi
On Thu, Oct 23, 2014 at 3:49 PM, Jianshi Huang
wrote:
> The Kafka stream has 10 topics and the data rate is quite high (~ 100K/s
> per topic).
>
> Which configuration do you recommend?
> - 1 Spark app consuming all Kafka topics
> - 10 separ
pecial DStream.
Jianshi
On Mon, Oct 27, 2014 at 4:44 PM, Shao, Saisai wrote:
> Yes, I understand what you want, but maybe hard to achieve without
> collecting back to driver node.
>
>
>
> Besides, can we just think of another way to do it.
>
>
>
> Thanks
>
1 - 100 of 157 matches
Mail list logo